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EXECUTIVE  SUMMARY 

The  overall  goal  of  this  project  was  to  understand  extremist  ideological  influences  underlying 
terrorist  and  insurgent  behavior,  in  a  way  that  supports  the  future  development  of  predictive 
models  of  adversary  decision  making.  The  National  Military  Strategic  Plan  for  the  War  on 
Terrorism  identifies  extremist  ideology  as  the  enemy’s  strategic  center  of  gravity,  and  the 
Department  of  Defense  (DOD)  plays  a  significant  role  in  establishing  an  environment 
unfavorable  to  extremist  ideas,  recruiting,  and  support  (Wald,  2006).  Yet,  the  specific  ideological 
characteristics  that  serve  as  enablers  for  extreme  action  have  not  been  well  understood.  Two 
cultural  studies  were  conducted  to  clarify  these  ideological  characteristics. 

The  overall  study  approach  was  to  collect  knowledge  source  materials  from  the  World  Wide 
Web.  The  World  Wide  Web  is  a  rich  source  of  cultural  information.  Aside  from  resources  that 
provide  background  information  about  cultural  groups  (e.g.,  their  history,  geographical  location 
and  ethno-linguistic  properties),  the  Web  also  provides  important  clues  as  to  the  beliefs,  attitudes 
and  values  of  group  members.  Once  the  knowledge  sources  were  collected  and  translated, 
relevant  passages  were  then  extracted  and  analyzed  in  order  to  elucidate  the  ideological 
characteristics  within  them.  In  Study  1,  human  cultural  researchers  conducted  the  analysis,  and  a 
new  model  of  beliefs-values  related  to  extremist  thought  was  constructed.  In  Study  2,  the  model 
was  further  tested  by  using  computational  text  analysis  methods  to  aid  in  analyzing  sentiment 
from  the  web-based  sources.  Such  methods  remain  at  an  early  research  phase  of  development,  so 
new  approaches  and  techniques  were  developed  for  accomplishing  automated  sentiment  analysis. 
Study  2  served  both  to  corroborate  the  model  of  extremist  beliefs-values  devised  in  Study  1 ,  as 
well  as  to  advance  the  state  of  the  art  in  automated  sentiment  analysis. 

The  current  advances  in  automated  sentiment  analysis  have  been  essential  for  applications  to 
cultural  modeling.  Recent  research  in  cultural  modeling  techniques  has  emphasized  new  ways  of 
representing  cultural  knowledge  (Sieck,  Rasmussen,  &  Smart,  2010a).  These  representation 
formats  have  further  led  to  novel  developments  in  areas  of  semi-structured  and  structured 
elicitation  methods  for  direct  human  data  collection  (Sieck,  Rasmussen,  Smith,  &  Kakar,  2010c), 
as  well  as  in  simulating  influences  of  information  on  culturally-shared  beliefs  (Sieck,  Simpkins, 
&  Rasmussen,  2011).  A  computational  method  for  measuring  cultural  values  in  web-based 
resources  adds  another  significant  component  to  the  cultural  analysts’  toolkit.  By  providing  a 
means  to  extract  and  quantify  the  cultural  values  embedded  in  large  and  increasing  volumes  of 
text  being  generated  on  the  web,  the  present  work  moves  a  step  closer  to  the  realization  of  a 
“social  radar”  for  monitoring  and  modeling  changes  in  the  sentiments  of  citizens  and  leaders 
(Maybury,  2010). 


iii 
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INTRODUCTION 

Advances  in  understanding  the  reasons  behind  terrorism  have  been  made  in  the  last  several 
years;  though  the  evidential  research  base  remains  thin  (Atran  &  Sageman,  2006).  Before 
discussing  potential  causes  of  terrorism,  however,  it  is  useful  to  offer  a  definition.  The  DOD 
defines  terrorism  as: 

The  calculated  use  of  violence  or  threat  of  violence  to  attain  goals,  political, 
religious,  or  ideological  in  nature.  This  is  done  through  intimidation,  coercion,  or 
instilling  fear.  Terrorism  involves  a  criminal  act  that  is  often  symbolic  in  nature 
and  intended  to  influence  an  audience  beyond  the  immediate  victim  (Department 
of  the  Army,  1983). 

Generally,  terrorist  support  and  recruitment  is  not  due  to  any  single  causal  factor,  but  instead 
stems  from  the  interplay  between  political  aspirations  of  terrorist  groups,  vulnerable  individuals, 
employment  of  terrorist  ideology,  and  wider  social  support  for  terrorism.  These  latter 
components  increasingly  depend  on  religious  doctrine  being  used  to  support  the  underlying 
ideology  (Speckhard,  2006).  A  general  characterization  of  the  overall  strategy  of  terrorist 
organizations  is  to: 

•  motivate  ordinary  persons  to  carry  out  terrorist  acts  to  meet  their  objectives 

•  exploit  moral  outrage  and  feelings  of  humiliation  based  on  political  events 

•  convince  by  means  of  texts  used  in  behalf  of  terror  ideology 

We  discuss  each  of  these  components  of  terrorist  strategy  in  turn.  First,  with  respect  to  profiles  of 
individuals,  what  research  there  is  indicates  that  suicide  terrorists  have  no  appreciable 
psychopathology  and  are  at  least  as  educated  and  economically  well  off  as  their  surrounding 
populations  (Atran,  2003).  Furthermore,  education  does  not  appear  to  be  correlated  with  support 
for  terrorism.  Finally,  although  economic  despair  may  provide  a  partial  answer,  it  does  not  offer 
a  complete  explanation  (Barsalou,  2002).  Importantly,  individuals  who  are  vulnerable  to  terrorist 
recruitment  are  not  motivated  to  take  part  in  suicide  terrorism  without  some  fonn  of  ideology  to 
guide  them,  as  well  as  an  overall  organization  to  support  their  activities  (Speckhard,  2006).  In  the 
case  of  Islamic  terrorism,  the  focus  of  the  current  report,  the  terrorist  ideology  is  well  integrated 
with  a  larger  body  of  religious  doctrine. 

The  balance  of  evidence  suggests  that  a  primary  factor  for  recruitment  of  Islamic  terrorists 
(“Jihadists”)  is  that  they  come  from  at  least  moderately  religious  backgrounds.  For  example, 
interviews  with  terrorist  recruits  in  Pakistan  indicated  that,  “None  were  uneducated,  desperately 
poor,  simple  minded  or  depressed,”  and  “all  were  deeply  religious.”  They  believed  that  their  acts 
were  “sanctioned  by  the  divinely  revealed  religion  of  Islam”  (Hassan,  2001).  Furthermore,  it  also 
seems  clear  that  religiosity  is  fostered  as  a  part  of  the  indoctrination  process  and  those  external 
events  can  trigger  greater  attention  to  religion.  For  example,  Bosnian  Muslims  typically  report 
not  considering  religious  affiliation  a  significant  part  of  identity  until  seemingly  arbitrary 
violence  forced  awareness  upon  them  (Atran,  2003).  Keep  in  mind,  however,  that  the  root  of 
terrorist  motive  is  political  dissent,  and  that  religion  is  used  as  a  vehicle  for  achieving  political 
ends.  The  question  we  address  is  how,  specifically,  is  religious  doctrine  used  to  advance  terrorist 
agendas?  Our  thesis  is  that  influences  of  religious  messages  can  be  understood  to  operate  at  two 
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psychological  levels,  the  cognitive  level  and  the  metacognitive  level. 1  Furthermore,  these  two 
levels  correspond  to  the  second  and  third  components  of  the  terrorist  strategy. 

Cognitive  Level 

The  second  component  of  Jihadist  strategy  is  to  exploit  public  emotional  responses  to  political 
events.  Terrorist  organizations  appear  to  be  quite  sophisticated  in  their  use  of  modern  media, 
including  use  of  the  World  Wide  Web  to  disseminate  vivid  imagery  of  moral  wrongdoing  by 
Americans  and  other  agents  of  the  West.  Furthermore,  humiliating  and  morally  outrageous 
events  are  not  considered  isolated  or  random,  but  rather  are  interpreted  within  an  overarching 
framework  that  a  unified  Western  strategy  exists  to  promote  a  “war  against  Islam”  (Sageman, 
2008). 

This  second  strategy  relies  heavily  on  terrorist  communication  of  specific  aspects  from  their 
ideological  framework  to  shape  the  common  perspective  of  their  intended  audiences.  For  the 
approach  to  be  successful,  the  ideas  they  are  promoting  must  fit  within  the  cultural  meaning 
systems  shared  across  the  population  they  are  addressing.  Hence,  this  second  strategy  operates  at 
the  “cognitive”  level.  One  application  of  cultural  modelling  to  terrorism  research  is  to  explicitly 
map  out  the  relevant  cultural  meaning  systems  in  order  to  better  understand  how  and  why 
various  messages  appear  to  be  effective  in  influencing  people’s  attitudes  and  gamering  their 
support  (see  Appendix  A;  Sieck,  2011).  Before  addressing  cultural  cognition  in  terrorism, 
however,  we  first  need  to  define  culture  from  a  cognitive  perspective. 

There  is  a  somewhat  natural  tendency  to  talk  about  culture  as  if  it  were  a  concrete,  material 
thing.  It  is  sometimes  described  as  something  people  belong  to,  or  as  an  external  substance  or 
force  that  surrounds  its  members  and  guides  their  behavior.  Although  it  is  sometimes  difficult  to 
avoid  speaking  in  these  metaphorical  terms,  such  an  ethereal  view  does  not  provide  a  useful  basis 
for  a  technical  definition.  An  alternative  approach  begins  by  defining  culture  in  terms  of  the 
widely  shared  ideas  (such  as  concepts,  values,  and  beliefs)  that  comprise  a  shared  symbolic 
meaning  system  (Rohner,  1984).  Within  this  conception,  a  population,  or  identifiable  segments 
of  a  population,  maintains  approximately  equivalent  and  complementary  learned  meanings.  In 
this  statement,  ‘approximately  equivalent’  acknowledges  that  no  two  people  within  a  culture 
share  exactly  the  same  ideas,  but  rather  highly  similar  meanings  are  shared  by  most  members  of 
a  society.  The  ‘complementary’  component  refers  to  the  fact  that  sharing  of  specialized 
knowledge  depends  on  status  and  roles  within  a  society  (e.g.,  an  imam  and  farmer). 

Taking  this  conception  a  step  further,  it  is  currently  popular  within  cognitive  science  to  draw  on 
a  disease  metaphor  for  understanding  cultural  ideas,  describing  the  ideas  that  spread  widely 
through  a  population  and  persist  for  substantial  periods  of  time  as  especially  ‘contagious’ 
(Sperber,  1996).  This  theoretical  framework  is  often  referred  to  as  the  epidemiological  view  of 
culture,  drawing  on  the  general  sense  of  epidemiology  as  describing  and  explaining  the 
distributions  of  any  property  within  a  population.  The  starting  point  for  working  from  this 
epidemiological  view  is  the  individual  idea  as  an  atomic  unit.  People  typically  use  the  word  idea 


1  Each  of  these  levels  can  be  seen  to  interact  with  emotional  systems,  as  well.  However,  our  position  is  that  religious 
messages  do  not  directly  influence  emotions,  but  rather  that  cognitive  interpretations  of  messages  evoke  emotions, 
sometimes  if  not  often,  very  quickly  and  intensely. 
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to  refer  to  any  content  of  the  mind,  including  conceptions  of  how  things  are  and  of  how  things 
should  be.  For  instance,  individuals  may  hold  the  concept  that  Western  nations  are  joined 
together  in  a  covert  war  against  Islam.  Their  minds  may  also  contain  the  value  that  imported 
Western  ideals,  such  as  the  separation  of  religious  and  state  affairs,  are  generally  bad  and  so 
should  be  avoided.  The  Cultural  Network  Analysis  (CNA)  approach  to  cognitive-cultural 
modelling  is  that  cultural  knowledge  consists  of  shared  networks  of  ideas,  and  that  there  is  value 
in  explicitly  considering  clusters  of  ideas  and  their  interrelationships  (Sieck,  Rasmussen,  & 

Smart,  2010b).  Networks  of  causally-interconnected  ideas  are  often  referred  to  as  folk  theories  or 
mental  models  (Gentner  &  Stevens,  1983).  Such  networks  constitute  people’s  explanations  for 
how  things  work,  and  result  in  judgments  and  decisions  that  influence  their  behaviour. 

From  this  perspective,  culture  refers  to  mental  models,  and  other  contents  of  the  mind,  for  which 
there  is  some  level  of  concordance  across  members  of  a  population  over  a  period  of  time.  A 
potential  issue  associated  with  this  definition  of  culture  is  how,  then,  to  define  the  population  of 
interest.  The  term  cultural  group  refers  to  a  population  or  sub-population  of  people  that  largely 
share  the  interconnected  ideas  of  interest.  The  issue  is  that  cultural  groups  are  distinct  from,  but 
related  to,  demographic  groups  (i.e.,  groups  based  on  nationality,  educational  status,  etc.)  in  that 
the  demographic  delineations  relevant  to  a  particular  cultural  group  will  depend  on  how 
widespread  the  cultural  ideas  of  interest  are.  For  example,  Sunni  and  Shia  sectarian  distinctions 
make  little  difference  if  the  idea  of  interest  is,  “There  is  no  god  but  Allah,  and  Mohammad  is  his 
prophet.”  However,  if  the  relevant  common  beliefs  include  those  pertaining  to  the  13th  Imam, 
then  that  demographic  does  become  important.  Hence,  the  relevant  cultural  group  for  a  study  will 
depend  on  the  cultural  domain,  that  is,  the  kind  and  topic  of  knowledge  of  interest. 

Consider  a  Sunni  Muslim  extremist  conception  of  socio-political  relationships  between  Islam 
and  the  West.  A  mental  model  of  such  relationships  contains  an  individual  person’s  concepts  as 
well  as  their  understanding  of  the  causal  relationships  between  concepts,  i.e.,  the  antecedents  and 
consequences  of  political  activities  and  their  outcomes.  This  mental  model  influences  the 
individual’s  expectations  for  how  socio-political  relationships  will  unfold  and  provides  a 
framework  for  selecting  behaviors  and  goals  within  this  context.  Figure  1  provides  a  network 
representation  that  might  describe  a  Sunni  Muslim’s  mental  model  of  current  political  events. 

The  set  of  ideas  represented  in  Figure  1  were  extracted  from  articles  that  describe  jihadist 
narratives  (Hafez,  2007;  Sageman,  2008).  Figure  1  depicts  a  number  of  ideas  using  circles,  lines, 
and  color:  circles  represent  concepts,  lines  represent  causal  beliefs,  and  color  represents  value, 
with  “green”  viewed  positively,  and  “red”  viewed  negatively.  These  ideas  include  simple 
concepts  such  as  “Western  arrogance”  and  “Muslim  honor”  represented  as  circles.  It  also 
includes  causal  ideas,  such  as  the  development  of  a  new  Islamic  caliphate  would  decrease  the 
extent  of  Western  dominance  and  bring  about  a  return  of  past  Islamic  glory.  Lines  in  the  figure 
represent  these  causal  ideas,  with  +/-  indicating  the  direction  of  the  causal  belief.  Finally,  Figure 
1  portrays  ideas  of  desired  states  or  value  using  color,  as  well  as  a  logical  flow  across  desired 
states.  Developing  an  Islamic  caliphate  is  a  good  thing.  Maintaining  (and  enhancing)  Muslim 
honor  is  likewise  valued. 
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Figure  1.  Sunni  Jihadist  Cultural  Model  of  political  relationships. 

Holding  the  beliefs  described  by  this  mental  model  is  likely  to  have  fairly  strong  consequences 
for  how  a  person  will  decide  and  act  in  a  number  of  specific,  relevant  situations.  For  example, 
according  to  the  model,  jihad  is  viewed  positively  and  should  be  supported  by  the  model’s 
adherents  due  to  the  perceived  anticipated  consequences  for  Muslims.  Most  directly,  support  for 
jihad  decreases  the  chances  that  the  West  will  continue  its  war  against  Islam,  and  enhances 
collective  Muslim  honor. 

As  implied  by  the  name,  mental  models  reside  inside  the  heads  of  individuals.  However,  when 
people  communicate  with  each  other  in  any  variety  of  modes,  they  develop  mental  models  that 
may  begin  to  resemble  one  another.  Mental  models  can  spread  widely  throughout  a  population, 
becoming  ‘cultural’  in  the  sense  of  being  shared  by  many  of  its  members.  A  cultural  model 
refers  to  an  external  representation  of  a  set  of  culturally-shared  mental  models  that  is  constructed 
by  a  researcher  (Sieck  et  ah,  2010b).  A  cultural  model  represents  a  consensus  of  the  mental 
models  for  a  particular  cultural  group  and  domain.  Hence,  for  the  Sunni  Muslims  who  hold 
beliefs  similar  to  the  elements  in  this  model,  Figure  1  serves  as  one  of  their  cultural  models  in 
the  domain  of  socio-political  relationships. 

Resulting  cultural  models  and  descriptions  of  their  dynamics  from  such  studies  can  provide 
considerable  insight  into  the  thinking  behind  communications  that  stem  from  terrorist  groups. 
They  also  provide  a  basis  for  developing  effective  counter-communications  by  aiding  in  the 
determination  of  what  makes  for  culturally  meaningful  messages  (Sieck,  2010).  Cultural  models 
would  allow  for  making  predictions  concerning  the  effectiveness  of  a  message  by  providing  the 
opportunity  to  assess  potential  unintended  inferences  that  individuals  with  a  certain  knowledge 
structure  might  make.  Specifically,  in  a  cultural  models  diagram,  each  concept  and  causal  belief 
represents  an  opportunity  to  effect  a  change  in  beliefs  or  concepts.  Hence,  such  diagrams  can 
provide  an  orderly  basis  for  determining  the  content  of  communications.  Messages  are  created  so 
as  to  affect  the  values  of  the  most  vulnerable  concept  nodes  (i.e.,  those  for  which  there  is  the 
least  consensus)  which  then  propagate  across  perceived  influences  to  affect  the  values  of  other 
concepts.  These  effects  spread  through  the  cultural  belief  network,  ultimately  changing  the  value 
in  overall  perceptions  or  cognitions  (Sieck  et  ah,  2011).  With  this  CNA  approach,  infonnation 
efforts  focus  on  transmitting  the  most  relevant  information  to  effect  conceptual  change  in  a  way 
that  makes  sense  within  the  cultural  group’s  understanding.  However,  one  aspect  that  CNA  does 
not  take  into  account  is  the  certainty  with  which  individual  members  of  a  culture  hold  their 
beliefs.  Certainty  plays  a  central  role  as  the  third  component  of  the  terrorist  strategy. 
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Metacognitive  Level 

The  third  component  of  terrorist  strategy  is  to  ensure  that  recruits  are  convinced  so  thoroughly 
that  they  will  not  consider  backing  out,  let  alone  feel  any  mercy  or  remorse  about  their  actions. 
For  a  suicide  terrorist  in  particular,  this  means  they  will  carry  forward  with  no  doubt  about  their 
decision  to  die  in  order  to  kill  others  (Speckhard,  2006).  Researchers  have  sometimes  described 
the  fully  indoctrinated  terrorist  as  a  “Cosmic  Warrior,”  who  harbors  no  ambiguity  or  doubt  about 
the  mission  or  means  to  accomplish  it  (Juergensmeyer,  2000).  This  religious  conviction  includes 
a  fundamental  belief  that  the  terrorist  knows  the  mind  of  God.  Such  a  belief  justifies  a  complete 
lack  of  tolerance  for  divergent  ideas,  even  of  other  believers  who  disagree  with  the  terrorist 
group  on  specific  issues. 

From  a  model-development  standpoint,  we  conceive  of  the  decision  to  accept  the  terrorist 
group’s  worldview  as  the  central  node  within  the  highest-level  of  a  hierarchy  of  terrorist  decision 
frameworks.  Developing  this  meta-level  framework  of  absolute  conviction  in  support  of  the  full 
hierarchy  of  terrorist  decision  frameworks  is  understood  to  rest  on  a  subtle  and  slow 
indoctrination  process.  Although  we  know  that  the  successful  exploitation  of  religious  texts  is  a 
key  component  of  the  process,  the  set  of  specific  religious  ideas  that  promote  such  certitude 
remains  unclear.  That  is,  we  know  that  terrorists  become  totally  convinced  through  religious 
texts  (Speckhard,  2006).  But,  what  in  the  religious  doctrine  accomplishes  such  moral  conviction? 
How  do  specific  religious  ideas  eliminate  doubt? 

We  propose  that,  as  new  recruits  become  indoctrinated  with  beliefs  justifying  terrorism,  they  are 
also  being  indoctrinated  with  specific  “metacognitive”  beliefs  drawn  from  Islamic  sources  that 
serve  to  erase  doubt  in  the  terrorist  agenda  and  provide  psychological  defenses  against  contrary 
views.  In  general,  metacognition  refers  to  knowledge  and  beliefs  about  one’s  own  knowledge 
and  thought  processes  (Flavell,  1979;  Metcalfe  &  Shimamura,  1994).  Whereas  the  cognitive 
level  addresses  what  to  believe  about  the  world,  the  metacognitive  level  addresses  instead  how 
one  should  believe.  We  refer  to  the  metacognitive  beliefs  introduced  by  extremists  and  terrorists 
as  “Polarizing  Metacognitive  Ideas”  (PMIs).  For  example,  the  idea  that  pluralism  results  in 
spiritual  contamination  is  a  PMI  because  holding  it  serves  to  discount  any  “cognitive  level”  ideas 
that  diverge  from  one’s  current  belief  set.  Another  example  is  that  innovative  ideas  are 
necessarily  false  and  evil.  PMIs  are  specific  kinds  of  beliefs  that  affect  the  cognitive  processes 
that  govern  feelings  of  confidence  in  worldviews.  The  excessive  levels  of  confidence  that 
ultimately  result  from  collections  of  PMIs  serve  to  sanction  extreme  actions  (e.g.,  supporting  or 
attempting  to  accomplish  nuclear  terrorism).  Past  research  has  shed  considerable  light  on  the 
cognitive  processing  underlying  overconfidence  and  its  relationship  to  decisiveness  both  within 
and  across  cultures  (Sieck,  Merkle,  &  Van  Zandt,  2007;  Yates  et  ah,  2010).  The  past  work  on 
overconfidence  provides  important  grounding  for  determining  the  polarizing  or  moderating 
effects  of  specific  metacognitive  ideas.  In  the  current  study,  we  use  this  past  work  in  conjunction 
with  new  analyses  of  the  contents  of  Islamic  web-based  discussions  and  sources  in  order  to  test 
our  primary  hypothesis  that  moderate  and  extremist  Islamic  ideologies  can  be  discriminated  by 
embedded  metacognitive  beliefs  relevant  to  certainty.  Specifically,  we  expect  that  extremist 
Islamic  sources  will  include  metacognitive  ideas  that  foster  cognitive  processing  that  leads  to 
increased  confidence  in  beliefs;  whereas,  moderate  Islamic  sources  will  tend  to  include 
metacognitive  ideas  that  lead  to  more  balanced  confidence  in  one’s  beliefs. 
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USING  THE  WORLD  WIDE  WEB  TO  MEASURE  CULTURAL  VALUES: 

OVERVIEW  OF  STUDIES  AND  METHODS 

The  World  Wide  Web  is  a  rich  source  of  cultural  information.  Aside  from  resources  that  provide 
background  information  about  cultural  groups  (e.g.,  their  history,  geographical  location  and 
ethno-linguistic  properties),  the  Web  also  provides  important  clues  as  to  the  beliefs,  attitudes  and 
values  of  group  members.  It  is  also  readily  available  and  free,  making  it  very  accessible.  The  use 
of  the  Web  as  a  cultural  knowledge  source  is,  however,  complicated  by  its  large-scale  nature 
(which  makes  relevant  information  hard  to  find)  and  the  fact  that  relevant  information  is  often 
represented  in  natural  language  fonnats  (this  limits  the  kind  of  automated  processing  that  can  be 
undertaken).  Ideally,  what  is  required  is  a  method  for  extracting  culture-relevant  information 
from  Web-based  resources  and  representing  this  information  in  a  way  that  supports  cultural 
modeling  and  analysis. 

Web-based  Collection 

The  overall  study  approach  was  to  collect  knowledge  source  materials  from  the  World  Wide 
Web,  translating  from  Arabic  to  English  where  needed.  We  then  extracted  relevant  passages  for 
classification  and  comparison  purposes  in  order  to  elucidate  the  role  of  metacognitive  ideas  and 
other  ideological  characteristics  within  them.  Our  initial  search  criteria  focused  on  materials  that 
provided  Islamic  justifications  for  Jihad  and/or  terrorism  from  leaders  and  learned  clerics  (on  the 
extremist  side),  or  documents  that  provided  Islamic-based  counter-arguments  to  Jihad/terrorism 
(or  arguments  that  constrain  it)  on  the  moderate  side.  These  criteria  also  provided  our  operational 
definitions  of  extremist  and  moderate  web  sources,  respectively.  We  collected  58  documents 
(2,01 1  pages)  using  these  initial  search  parameters.  Findings  from  the  initial  round  of  collection 
and  other  sources  suggested  a  set  of  search  parameters  that  focused  more  on  metacognitive  ideas, 
including  a  new  set  of  questions,  phrases,  and  terms  for  use  in  searching  Arabic  web  documents. 
We  collected  another  210  documents  (1,723  pages)  using  these  refined  search  parameters.  The 
specific  search  parameters  guiding  the  second  round  of  collection  along  these  dimensions 
included: 

1.  What  is  the  Islamic  concept  of  knowledge?  What  is  the  Islamic  attitude  towards  learning? 
What  is  the  feeling  about  change  in  thinking  or  culture?  Arabic  word:  bida 

2.  What  is  the  Islamic  feeling  about  different  ways  of  knowing?  Is  there  a  contradiction 
between  Islamic  religion  and  science?  What  is  the  Islamic  stance  on  pluralism?  Arabic 
words:  kafir,  taq’yya 

3.  What  should  be  the  relationship  between  Muslims  and  non-Muslims?  Should  Muslims 
interact  with  non-Muslims?  How  much,  under  what  conditions?  How  to  treat  non- 
Muslims? 

4.  How  much  freedom  of  thought  is  there  in  Islam?  Can  believers  interpret  Koran/other 
texts?  Should  believers  rely  on  authority  to  understand  Koran  in  Islam?  What  about 
critical  thinking,  independent  reflection  in  Islam?  Arabic  words:  ijtihad,  at-taffakkur, 
at-taddabbur 

5.  How  zealous  or  strict  should  Muslims  be  in  their  spiritual  practices?  What  does  Islam  say 
about  moderation,  avoiding  excesses?  Arabic  word:  at-tassut 
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We  collected  documents  from  three  different  collections  of  sources  on  the  web.  First,  we 
conducted  searches  of  websites  associates  with  various  known  Islamic  extremists,  which  we 
describe  in  more  detail  below.  In  order  to  establish  moderate  baselines  for  comparison,  we 
conducted  parallel  collection  from  broad-based  Google  searches  of  two  additional  sources: 

U.  S.  English  Islamic  sites  and  pan-Arabic  language  Islamic  sites. 

With  respect  to  the  extremist  websites,  we  searched  a  variety  of  sources,  including  religious 
teachers  who  post  sermons  (e.g.,  Sheikh  Mohammed  Alvzazi;  Sheikh  Abu  Saad  Al-Ameli,  and 
Mufti  Sheikh  Mohammed  Saleh),  discussion  forums  such  as  Altartosi,  A1  Fazzazi,  Tawhed,  and 
terrorist  organizations  websites  such  as  AQ,  Hezbollah,  Hamas,  Al-Quds.  We  provide  examples 
of  some  of  the  most  productive  sites  for  illustration: 

http  ://www.  scholarofthehouse.  org/ 

A  website  dedicated  to  Khaled  Abou  el-Fadl,  which  has  his  books  and  lectures  available 
for  download. 

http://muslimdefenseforce.islamicink.com/lectures-of-sheikh-faisal/ 

This  website  does  not  function  reliably,  but  when  it  is  functioning,  it  contains  an  abundance  of 
extremist  content,  including  lectures  by  Sheikh  Faisal,  Awlaki  and  other  “leaders  of  jihad.”  The 
website  has  numerous  relevant  sections,  such  as  sections  dedicated  to  Taliban  military 
operations,  pictures  of  “American  Zionist  crusader  hell  in  Afghanistan,”  ummah  news,  and  the 
mujahedeen  in  pictures. 

http://www.revolutionmuslim.com/2010/07/islamic-ruling-on-permissibility-of.html 

This  is  a  very  popular  website  which  contains  links  to  videos  and  articles  supporting  the 
extremist  scholars,  such  as  Shaykh  Omar  Bakri  Muhammad,  Abu  Hamza  al-Masri,  Abdullah 
Azzam,  and  Anjem  Choudary.  While  the  website  is  very  well  known  and  contains  links  in  which 
participants  who  post  comments  openly  support  jihad,  the  creator  of  the  website  makes  a  point  to 
never  actually  incite  jihad  or  say  anything  that  would  be  considered  “extremist”  in  nature.  The 
creator  insists  the  website  is  to  serve  as  an  up-to-date  news  source  for  Muslims  that  will  counter 
the  mainstream  propaganda  put  forth  by  other  news  agencies. 

http://www.abubaseer.bizland.com/ 

This  is  Abu  Basir  al  Tartusi’s  website,  in  Arabic.  The  site  has  a  publications  page  as  well  as  a 
research  and  articles  page  where  documents  are  made  available  to  read  and/or  download.  The 
site  also  has  lectures  available  in  an  audio  format.  On  the  website,  Al  Tartusi  states  numerous 
reasons  why  the  website  was  created,  including  to  serve  as  “an  invitation  to  jihad  in  the  cause  of 
Allah  which  is  the  Ummah’s  only  way  to  change  this  state  of  humiliation  and  to  establish  a 
rightly-guided  Islamic  way  of  life.” 

http://iskandrani.wordpress.com/ 

This  website  has  a  scholars  section  which  includes  240  lectures  and  postings  regarding  30 
scholars,  including  Abdullah  Azzam,  Abu  Basir,  Abu  Muhammad  al-Maqdisi,  Abu  Qatadah, 
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al-Khatib  al-Baghdadi  and  many  others.  The  website  has  not  been  updated  since  October  2009, 
but  still  serves  as  a  good  source  of  information.  It  also  contains  a  books  and  PDFs  section  with 
downloadable  materials  from  the  above-mentioned  scholars. 

http://izzatulillah.wordpress.com/ 

This  website  has  books  available  from  different  scholars  including  Maqdisi.  Additionally, 
material  can  be  found  in  the  daily/weekly  web  postings. 

http://www.tawhed.net/  and  http://www.tawhed.ws/ 

This  website  is  a  very  good  resource  for  information,  including  a  section  with  books  from 
scholars  that  are  available  for  download.  Some  examples  include  Azzam  and  Qutb,  and  a  link  to 
“39  Ways  to  Serve  and  Participate  in  Jihad.”  This  website  includes  articles  about  and  interviews 
of  individuals  such  as  bin  Laden,  al-Yazid  and  Adam  Gadahn,  and  an  assortment  of  magazines, 
such  as  “The  Call  of  Islam.”  There  are  also  links  to  other  similar  websites  and  a  library  section 
strictly  dedicated  to  Maqdisi. 

http://www.khayma.com/ 

This  website  has  a  lot  of  information  and  numerous  links  to  explore.  The  Islam/Islamic  Jihad 
section  includes  many  articles  and  statements,  and  includes  links  to  sites  for  individual  extremist 
groups.  There  is  also  a  “Lectures”  section  on  the  “Islam”  link.  The  website  also  contains  non¬ 
extremist  infonnation  in  addition  to  the  Islamic  Jihad  content. 

Translation 

The  approach  for  performing  analysis  on  resources,  which  appeared  in  Arabic  was  to  have  the 
document  translated  into  English  before  analyzing  it.  Transliteration  tools  and  human  translation 
were  used  in  conjunction  to  perform  this  task.  Transliteration  provides  a  word-for-word 
translation  and  often  requires  a  human  analyst  in  the  process  to  make  sense  of  the  translated 
materials.  These  tools  were  used  in  the  collection  and  initial  vetting  of  documents.  Human 
translation  was  performed  on  selected  documents  and  segments  that  best  met  the  study  criteria  to 
improve  comprehension.  In  some  cases,  interviews  were  conducted  with  native  speakers  about 
specific  passages  to  better  understand  the  context,  subtle  connotations,  and  other  Islamic 
references  regarding  the  ideas  contained  within  them. 
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STUDY  1:  EXPLORATORY  CULTURAL  ANALYSIS 
Method 

As  indicated  above,  the  collection  effort  was  quite  extensive,  as  it  was  intended  to  supply 
information  to  be  used  in  the  automated  text  analysis  study  that  was  the  primary  focus  of  Study 
2,  as  well  as  for  the  initial  “human”  analysis  effort  reported  here.  In  order  to  maintain  a 
manageable  amount  of  information  for  human  analysis,  we  randomly  selected  a  subset  of 
documents  to  work  with  for  extensive  qualitative  and  quantitative  analysis  purposes.  In  the  end, 
we  settled  on  1 1  documents  from  Arabic  extremist  websites  and  15  documents  from  U.S.  English 
mainstream  searches.  Individual  documents  varied  in  length,  the  complete  set  included  was 
extensive  by  qualitative  research  standards.  Two  members  of  the  research  team  then  read  the  26 
documents,  and  conducted  thematic  content  analyses  in  order  to  identify  examples  of  different 
kinds  of  metacognitive  ideas.  Based  on  the  results,  the  researchers  developed  a  model 
representing  the  distinct  kinds  of  metacognitive  ideas.  The  qualitative  results  section,  below 
contains  the  model,  which  is  supported  by  different  examples.  The  researchers  then  reviewed  the 
documents  again,  and  extracted  all  the  excerpts  they  identified  that  appeared  to  be  related  to 
metacognition  in  some  way,  or  to  have  implications  for  metacognitive  processing.  This  process 
resulted  in  a  set  of  52 1  excerpts  for  further  analysis,  as  described  in  the  quantitative  analysis 
results  section  below. 

Results 

Qualitative  Analysis  and  Results.  The  initial,  thematic  analysis  results  were  organized  in  model 
of  five  dimensions  of  metacognitive  ideas.  These  five  dimensions  were  hypothesized  to 
discriminate  between  extremist  and  moderate  Islamic  ideologies  in  psychological  terms.  We 
discuss  each  of  the  dimensions,  in  turn,  including  how  each  relates  to  increasing  confidence  in 
belief  at  the  “cognitive  level.”  To  elucidate  each  dimension  further,  we  also  provide  specific 
examples  of  Islamic  ideological  content  drawn  from  the  thematic  analysis. 

1 .  Knowledge:  Maintenance  vs.  change  at  individual  or  cultural  levels.  Cultural  ideas  that 
emphasize  the  priority  and  continuance  of  long-established  conceptions  of  the  world  and 
associated  practices  appear  to  be  quite  central  kinds  of  polarizing  metacognitive  ideas, 
serving  as  such  to  increase  the  level  of  certainty  that  existing  culturally-shared  knowledge 
and  worldviews  are  correct.  On  the  other  hand,  beliefs  that  emphasize  knowledge  acquisition 
and  change  at  the  individual  and  cultural  levels  imply  that  existing  beliefs  may  be  wrong, 
incomplete,  or  no  long  fit  with  current  situations.  Interestingly,  such  beliefs  are  sharply 
contrasted  among  different  groups  of  Muslims,  as  illustrated  in  the  following  example 
passages. 

Knowledge  Maintenance 

America...  came  to  change  the  fundamentals  of  the  nation,  changing  the 
curriculum,  and  the  removal  of  goodness  springs  in  the  conscience  of  the  Islamic 
nation,  and  blocks  the  road  to  awakening 
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...In  ittibaa  ’  (following  of)  the  prophet  (saw),  which  is  built  upon  following  his 
sunnah  and  turning  away  from  bida  ’  (religious  innovations).  Sunnah  is  good  and 
is  to  be  followed,  whilst  bida  ’  is  evil  and  is  to  be  shunned. 

The  prophet  (saw)  has  informed  us  of  this  in  the  hadeeth  of  Hudayfah  (ra)  in 
which  he  said:  “The  people  used  to  ask  the  Messenger  of  Allah  (saw)  about  the 
good,  and  I  used  to  ask  him  about  the  evil  out  of  fear  that  it  would  reach  me. 

...I further  enquired:  “Then  is  there  any  evil  after  this  good?  ” 

He  said:  “Yes!  Callers  at  the  gates  of  Hell  -  whoever  responds  to  their  call,  they 
will  be  thrown  into  the  fire.  ”  ...Meaning:  Whoever  obeys  the  callers  of  innovation 
and  misguidance  then  his  end  will  be  the  fire,  because  the  Prophet  (saw)  has  said: 
“Every  innovation  is  misguidance  and  every  misguidance  is  in  the  fire.  ” 

So  the  innovation  is  in  the  fire  along  with  its  companion.  Whoever  obeys  the 
callers  of  innovation  will  be  led  to  the  Fire  and  whosoever  obeys  the  callers  of 
Sunnah  will  be  led  to  Paradise. 

We  have  to  acquire  knowledge  of  the  Qur’an  and  the  Sunnah  upon  the 
understanding  of  the  Salafus-Saalih  (Pious  predecessors — the  first  three 
generations  of  Muslims)  in  order  to  comprehend  our  state  of  affairs.  However,  if 
we  rely  on  newspapers,  magazines  and  the  radio  then  these  media  sources  belong 
to  the  disbelievers,  the  West.  Will  they  be  truthful  in  their  narrations  and  in  their 
solutions?  Do  they  really  want  good  for  the  Muslims?  Indeed,  they  do  not  spread 
except  that  which  weakens  the  Muslims  and  makes  them  falsely  believe  in  the 
West. 

Knowledge  Change 

With  time  God’s  message  takes  on  a  certain  form  and  religious  tradition.  From 
time  to  time  scholars  add  to  and  amplify  this  message.  Therefore,  religion 
becomes  a  dynamic  and  evolving  concept,  rooted  in  histoiy  but  given  various 
interpretations  suited  to  our  times  and  addressing  our  problems  and  challenges. 
We  should  learn  and  incorporate  what  is  good  in  the  American  experience,  listen 
to  new  ideas  and  not  be  afraid  of  new  interpretation  of  religious  doctrine. 

Beside  various  Qur  ’anic  verses  emphasizing  the  importance  of  knowledge,  there 
are  hundreds  of  Prophetic  traditions  that  encourage  Muslims  to  acquire  all  types 
of  knowledge  from  any  corner  of  the  world.  Muslims,  during  their  periods  of 
stagnation  and  decline,  confined  themselves  to  theology  as  the  only  obligatory 
knowledge,  an  attitude  which  is  generally  but  wrongly  attributed  to  al-Ghazali ’s 
destruction  of  philosophy  and  sciences  in  the  Muslim  world. 

‘ilm,  attainment  of  which  is  obligatory  upon  all  Muslims  covers  the  sciences  of 
theology’,  philosophy,  law,  ethics,  politics  and  the  wisdom  imparted  to  the  Urnrnah 
by  the  Prophet  (S).  Al-Ghazali  has  unjustifiably  differentiated  between  useful  and 
useless  types  of  knowledge.  Islam  actually  does  not  consider  any  type  of 
knowledge  as  harmful  to  human  beings.  However,  what  has  been  called  in  the 
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Qur  ’an  as  useless  or  rather  harmful  knowledge,  consists  of  pseudo  sciences  or  the 
lores  prevalent  in  the  Jahiliyyah. 

Islam  never  maintained  that  only  theology  was  useful  and  the  empirical  sciences 
useless  or  harmful.  This  concept  was  made  common  by  semi-literate  clerics  or  by 
the  time  servers  among  them  who  wanted  to  keep  common  Muslims  in  the 
darkness  of  ignorance  and  blind  faith  so  that  they  would  not  be  able  to  oppose 
unjust  rulers  and  resist  clerics  attached  to  the  courts  of  tyrants.  This  attitude 
resulted  in  the  condemnation  of  not  only  empirical  science  but  also  ‘ilm  al-kalam 
and  metaphysics,  which  resulted  in  the  decline  of  Muslims  in  politics  and 
economy.  Even  today  large  segments  of  Muslim  society,  both  the  common  man 
and  many  clerics  suffer  from  this  malady.  This  unhealthy  and  anti-knowledge 
attitude  gave  birth  to  some  movements  which  considered  elementary  books  of 
theology  as  sufficient  for  a  Muslim,  and  discouraged  the  assimilation  or 
dissemination  of  empirical  knowledge  as  leading  to  the  weakening  of faith. 

2.  Coherence:  Homogeny  vs.  diversity  (individual  and  cultural  levels).  The  recruitment  and 
consideration  of  a  broad  set  of  divergent  ideas  is  typically  considered  to  be  critical  among  the 
cognitive  processes  that  protect  against  excessive  confidence  in  ones  beliefs  (Koriat, 
Lichtenstein,  &  Fischhoff,  1980;  Lee  et  ah,  1995).  Such  a  thinking  style  naturally  requires 
that  diverse  ideas  are  available,  either  in  individuals’  knowledge  bases  or,  perhaps  more 
commonly,  distributed  among  individuals  within  communities  at  various  levels  of 
granularity.  Hence,  ideas  that  stress  the  acceptability  of  diversity  in  beliefs  and  knowledge 
ensure  the  availability  of  divergent  ideas  that  can  enter  into  decision  processes.  Ideas  that 
emphasize  homogeny  in  beliefs  and  knowledge,  in  contrast,  reduce  the  chances  of  divergence 
and  hence  contribute  to  increased  confidence  in  decision  making.  As  shown  below,  specific 
ideas  that  emphasize  homogeny  and  diversity  were  found  to  correspond  with  extremist  and 
moderate  Muslims,  respectively. 

Homogeny 

The  Islamic  Jihad  has  been  emptied  of  its  contents  through  different  schemes  and 
weird  manners  of  lying,  falsification,  and  twisting.  The  knowledgeable  ones 
among  the  Islamic  scholars  know  quite  well  that  the  ultimate  goal  of  Jihad  in 
Islam  is  for  no  religion  to  remain  on  Earth  except  Islam,  as  Allah,  the  All-Exalted, 
says,  “And  fight  them  until  there  is  no  more  Fitnah  (disbelief  and  polytheism)  and 
for  religion  to  be  that  of  Allah.  But  if  they  cease,  then  certainly,  Allah  is  All-Seer 
of  what  they  do.  ”  Al-Anfal,  verse:  39 

People,  in  the  Shari  ’ah  of  Allah,  are  to  be  classified  according  to  their  religion. 

Among  them  is  the  Mu  ’men  (Believer),  and  the  Kafir  (infidel);  and  each  is  to 
undergo  the  terms  related  to  him  according  to  his  tenet,  his  attributes,  and  they 
are  divided  in  the  way  Allah  has  described  them.  Allah,  the  All-Exalted,  says,  “It 
is  He  who  created  you,  but  one  of  you  is  an  unbeliever  and  another  of  you  is  a 
Believer;  and  Allah  sees  what  you  do.  ”  At-Taghabun,  verse:  2 
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Nation  of  Islam  von  should  know,  that  the  Shiites  is  the  religion  that  does  not  meet 
with  Islam,  hut  will  also  meet  with  Christians,  Jews  under  the  name  of  the  people 
of  the  book.  It  is  distortion  of  the  Koran,  and  insulting  the  companions,  and  to 
challenge  the  Mothers  of  the  Believers. 

Diversity 

From  time  immemorial,  man  has  found  different  ways  of  knowing  God.  Human 
beings  of  various  intellectual  levels  have  found  their  own  ways  to  God.  Common 
people  have  found  simple  ways;  whereas  thinkers  and  philosophers  reached  the 
same  conclusion  on  a  higher  plane  of  thought. 

“ The  same  religion  has  He  established  for  you  as  that  which  He  enjoined  on 
Noah,  and  that  which  We  sent  by  inspiration  to  you  and  that  which  We  enjoined 
on  Abraham,  Moses  and  Jesus:  namely  that  you  should  remain  steadfast  in 
religion,  and  make  no  division  therein  ”  (Ash-Shura  ’  42:13) 

Only  few  messengers  were  mentioned  in  the  Qur  ’an;  others  were  alluded  to  “and 
messengers  we  related  to  you,  and  messengers  we  did  not  relate  to  you  ”  An- 
Nesa  ’a  4:164  &  40:78.  Some  of  these  messengers  could  have  been  sent  to  India 
or  China. 

3.  Information  Exchange:  Separation  vs.  interaction  with  other  groups.  Beliefs  concerning 
religious  groups’  interactions  with  non-members  affect  confidence  byway  of  the  same 
cognitive  mechanisms  as  beliefs  about  diversity.  That  is,  these  beliefs  also  enhance  or  curtail 
the  potential  for  recruitment  of  divergent  considerations  into  members’  thought  processes. 
However,  in  this  case,  the  effects  are  achieved  as  a  result  of  beliefs  that  govern  the  extent  and 
kind  of  communication  with  others  who  may  yield  such  considerations,  rather  than  by 
attempting  to  influence  the  level  of  homogeneity/diversity  at  the  source  (Sniezek  &  Henry, 
1989).  Examples  of  each  kind  of  idea  follow: 

Maintain  Separation  from  Other  Groups 

Imam  Al-Bukhari  -  may  God  have  mercy  on  him  -  says:  “I  do  not  forget  the 
infidels  only  I  know  of  their  destiny.  ”  He  said,  “Do  not  pray  after  a  Jew  or  a 
Christian,  do  not  eat  slaughtered  meat  from  them,  do  not  attend  their  funerals, 
and  do  not  take  care  of  their  sick.  ” 

Engage  in  Interaction  with  Other  Groups 

Allah  (S.W.T)  said  in  surat  Al-Mumtahinah,  what  can  be  translated  as,  “Allah 
does  not  forbid  you  (Muslims)  to  deal  justly  and  kindly  with  those  who  have  not 
fought  against  you  in  accounts  of  your  religion  and  who  do  not  drive  you  out  from 
your  homes.  Verily,  Allah  loves  those  who  deal  with  equity.  ”  (Verse  8).  This  great 
verse  clearly  states  the  normal  and  original  state  for  a  good  relationship  between 
Muslims  and  Non-Muslims. 


Approved  for  Public  Release  (Distribution  is  Unlimited) 


12 


Cognitive  Solutions  Division 
Applied  Research  Associates,  Inc. 


Final  Report 

Prime  Contract  No.:  N00014-10-C-0078 


The  future  of  Islam  depends  on  safeguarding  our  convictions  and  living  our  Islam 
and  be  active  members  of  society,  not  in  hiding  or  retreating  but  in  reaching  out 
and  working  with  people  of  other  faiths  to  improve  the  life  of  all  Muslims  and 
Non-Muslims  alike. 

4.  Judgment:  Authority  vs.  independence.  Whereas  the  two  prior  dimensions  emphasized  the 
recruitment  of  considerations  component  of  cognitive  processing,  the  current  dimension  is 
concerned  with  who  is  sanctioned  (and  responsible)  to  engage  in  such  processing  in  the  first 
place  (Yates,  2003).  Beliefs  including  that  only  a  selected  few  should  make  appraisals  and 
judgments  pertaining  to  the  religious  system  can  serve  to  increase  confidence  in  the  system 
across  the  board  in  at  least  a  couple  of  ways.  First,  many  members  are  often  not  privy  to 
significant  discussions  and  disagreements  that  arise  from  time  to  time  among  the  group  with 
authority,  and  hence  competing  thoughts  are  unlikely  to  enter  the  members’  own 
representations.  Secondly,  propagation  of  knowledge  for  such  groups  is  more  likely  to 
emphasize  repetition  and  memorization  among  adherents.  In  groups  where  individuals  are 
assigned  more  responsibility  to  interpret  situations  and  exercise  judgment  themselves, 
learning  tends  to  include  simulations  to  practice  thinking  in  this  manner.  Past  work  on 
confidence  suggests  that  learning  processes  that  stress  memorization  lead  to  increased 
overconfidence,  as  compared  with  more  reflective  learning.  In  the  Islamic  passages 
examined,  two  issues  emerged  that  generally  seemed  to  fit  within  the  authority  vs. 
independent  judgment  dimension.  The  first  was  a  question  of  whether  or  not  individuals 
should  choose  what  religion  to  belong  to  (if  any),  and  the  second  had  to  do  with  the  extent  to 
which  members  ought  to  fonnulate  their  own  interpretations  of  religious  texts,  as  well  as  how 
various  teachings  would  apply  in  given  situations.  A  few  examples  are  provided  below. 

Authority-based  Judgment 

We  do  not  have  in  our  religion  such  thing  as  freedom  of  belief  but  we  have  in 
our  religion  is  what  the  Holy  Prophet  in  Bukhari:  “Whoever  changes  his  religion, 
kill  him.  ” 

The  freedom  of  belief,  as  put  forward,  and  personal  freedom  is  not  the  right  one 
for  a  Muslim  country.  Security  and  stability  comes  with  faith  and  victory  for  the 
religion  of  Allah,  there  is  no  security  or  stability  when  it  is  not  security  and 
stability  to  the  religion  of  God  first  and  foremost 

Independence  in  Judgment 

“Say:  The  truth  is  from  your  Lord”  Let  him  who  will  believe,  and  let  him  who  will 
reject  faith  ”  (Al-Kahf  18:29) 

“ Let  there  be  no  compulsion  in  religion:  truth  stands  out  clear  from  error” 

(Al-Baqara  2:256) 

Messengers  invite  to  Allah,  but  do  not  force  others  into  belief...  “If  it  had  been  the 
will  of  your  Lord,  they  would  all  have  believed,  all  who  are  on  earth!  Will  you 
then  compel  mankind  to  believe  against  their  will!”  (Yunus  10:99) 
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Some  Muslims  understand  that  old  scholars  lived  in  a  different  society,  and  that 
their  opinions  are  not  written  in  stone.  They  note  that  some  of  these  great 
scholars  changed  their  rulings  based  on  new  information  and  different 
circumstances.  They  understand  that  the  Qur  ’an  asks  us  to  think  and  consider  and 
reach  conclusions  that  help  us  in  our  life.  They  realize  that  Ijtihad  (critical 
thinking  to  come  up  with  solutions  to  new  problems)  should  always  be  available 

5.  Deliberation:  Polarized  vs.  balanced.  The  final  dimension  concerns  beliefs  about  the  extent 
to  which  a  person  should  consider  all  evidence  and  opinions  within  a  situation,  weighing  the 
pros  and  cons  of  various  alternatives  during  deliberation,  or  whether  they  should  instead 
apply  (moral)  principles  in  absolute  terms.  The  polarized  endpoint  of  this  dimension  further 
emphasizes  that  one  way  is  absolutely,  100%  correct  or  good,  that  one’s  own  goals  are 
paramount,  so  any  actions  necessary  to  achieve  goals  are  reasonable,  and/or  that  one  should 
accept  opinions  that  support  ones’  desired  beliefs/actions.  Examples  include: 

Polarized  Deliberation 

One  of  the  most  eloquent  symptoms  of  the  moral  bankruptcy  of  Western  culture  is 
a  certain  fashionable  attitude  toward  moral  issues,  best  summarized  as:  ‘'There 
are  no  blacks  and  whites;  there  are  only  grays.  ” 

Balanced  Deliberation 

They  wish  to  dig  out  the  rare,  strange,  and  doubtful  opinions  from  here  and  there 
in  order  to  strengthen  and  back  up  their  rants 

One  needs  to  know  of  something  called  balance  and  justice... people  should  look 
for  the  truth  from  themselves  and  from  other  than  them 

A  final  dimension,  initially  considered  at  the  qualitative  analysis  stage,  was  the  frequency  and 
intensity  with  which  believers  would  be  expected  to  engage  in  religious  rituals  and  other 
practices.  The  idea  here  is  that  the  more  time  and  energy  participants  expend  on  religious 
practices,  to  the  exclusion  of  other  kinds  of  activities,  then  the  more  salient  and  highly  activated 
the  system  of  religious  knowledge  will  be  to  them.  Highly  activated  knowledge  tends  to  be 
retrieved  very  fluently,  and  ease  of  retrieval  has  been  shown  to  have  a  positive  influence  on 
confidence  (Sanna,  Schwarz,  &  Small,  2002).  In  the  end,  we  excluded  this  dimension  from 
further  consideration  as  we  did  not  find  direct  passages  that  discuss  the  intensity  of  religious 
practices  among  adherents.  However,  expressions  regarding  strong  devotion  have  been 
summarized  by  other  researchers  (Hafez,  2007): 

As  for  suicide  bombers  in  particular,  almost  invariably  they  are  portrayed  as 
genuinely  religious  people  who  love  jihad  more  than  they  love  life  and  fear  God 
more  than  they  fear  death.  The  biographies  often  detail  at  length  how  the  ‘martyr  ’ 
used  to  pray  incessantly  and  spent  his  time  reading  the  Quran.  The  bombers  are 
said  to  have  prayed  in  the  mosque,  as  opposed  to  praying  at  home,  which  is  the 
best  option  in  the  eyes  of  God.  They  often  pray  more  than  the  average  Muslim, 
certainly  more  than  is  expected  of  them  by  God.  They  also  wake  up  to  make  their 
pre-dawn  prayers  (qiyam),  which  is  not  a  religious  obligation,  but  a  voluntary 
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expression  of  devotion.  Some  are  said  to  have  memorized  the  Quran  by  heart  at  a 
very  young  age;  others  fast  every  Monday  and  Thursday,  when  they  are  not 
required  to  do  so  by  religion  (although  it  is  part  of  the  Sunna).  (pg.  103-104) 

On  the  other  hand,  we  did  encounter  a  few  examples  suggesting  temperate  religious  practice, 
such  as: 

Anas  related  the  story  of  the  three  men  who  asked  the  Prophet’s  wives  about  his 
worship.  When  they  vowed  to  pray  all  night,  or  fast  every  day  or  refuse  to  many, 
the  Prophet  (PBUH)  said:  (By  Allah,  I  fear  Allah  the  most  and  among  you  I  know 
Him  the  best,  but  I fast  but  not  every  day,  and  I  pray  at  night  but  not  all  night,  and 
I  many  women;  whoever  does  not  wish  to  follow  my  way,  he  is  not  from  us) 

Quantitative  Analysis  and  Results 

We  developed  a  detailed  coding  scheme  based  on  the  results  from  the  qualitative,  thematic 
analysis  of  the  data.  As  mentioned  earlier,  we  also  extracted  over  500  excerpts  from  the 
collected  documents,  and  then  had  two  raters,  working  independently,  code  each  excerpt  in 
terms  of  its  metacognitive  dimension  and  valence.  We  describe  the  coding  scheme  and  inter-rater 
reliability  next. 

Coding  and  Reliability 

1 .  Knowledge:  Maintenance  vs.  change  (individual  or  cultural).  This  dimension  refers  to 
beliefs  about  the  pursuit  of  knowledge  and  innovation.  Beliefs  that  encouraged  knowledge 
maintenance  were  coded  as  - 1  and  beliefs  that  supported  knowledge  change  were  coded 
as  +1. 

Knowledge  Maintenance 

•  Emphasize  the  priority  and  continuance  of  long-established  conceptions  of  the  world 
and  associated  practices 

•  Encourage  return  to  past  ways 

•  Indicate  priority  of  learning  traditional  doctrine  over  other  forms  of  knowledge 

•  Signify  that  some  forms  of  knowledge  are  bad/useless 

Knowledge  Change 

•  Emphasize  the  importance  of  learning,  knowledge,  knowledge  acquisition 

•  Value  of  change  at  the  individual  or  cultural  levels 

•  Indicate  existing  beliefs  may  be  wrong,  incomplete,  or  no  long  fit  with  current  situations 

•  Signify  that  all  knowledge  is  good 

2.  Coherence:  Homogeny  vs.  diversity  (individual  or  cultural).  This  dimension  refers  to 
beliefs  about  whether  it  is  acceptable  for  people  to  have  different  beliefs.  Ideas  that 
emphasized  the  importance  of  homogeny  were  coded  as  -1,  and  ideas  that  supported  diversity 
of  beliefs  were  coded  as  +1. 
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Homogeny 

•  Emphasize  the  importance  of  commonality  or  consistency  of  beliefs  and  knowledge, 
either  in  individuals’  knowledge  bases  or,  among  individuals  within  communities  at 
various  levels  of  granularity. 

Diversity 

•  Stress  the  acceptability  or  even  benefit  of  diversity  in  beliefs  and  knowledge  at  the 
individual  or  group  level. 

3.  Information  Exchange:  Separation  vs.  interaction  with  other  groups.  This  dimension 
refers  to  beliefs  concerning  religious  groups’  interactions  with  non-members,  including 
beliefs  that  govern  the  extent  and  kind  of  communication  with  others  who  may  hold  different 
beliefs.  Statements  that  encouraged  separation  were  coded  as  -1,  and  statements  that 
encouraged  interaction  were  coded  as  +1. 

Separation 

•  Members  should  avoid  open  communication  with  non-members 

•  Communications  should  be  one  sided  (informing  the  other) 

•  Interactions  generally  negative 

Interaction 

•  Interaction  with  non-members  is  acceptable,  even  encouraged 

•  Interactions  should  be  positive 

4.  Judgment:  Authority  vs.  independence.  This  dimension  is  concerned  with  who  is 
sanctioned  (and  responsible)  to  engage  in  real  thinking,  reflection,  interpretation,  and 
decision  making.  Statements  that  encouraged  obedience  to  judgments  made  by  authorities 
were  coded  as  -1,  and  statements  that  encouraged  independence  in  judgment  were  coded 
as  +1. 

Authority 

•  Only  a  selected  few  should  make  appraisals  and  judgments 

•  Free  choice  of  what  religion  to  belong  to  (if  any)  is  a  bad  thing 

Independence 

•  Every  member  should  think  for  themselves 

•  Individuals  should  choose  what  religion  to  belong  to  (if  any),  no  compulsion  in  religion 

•  Members  ought  to  formulate  their  own  interpretations  of  religious  texts,  as  well  as  how 
various  teachings  would  apply  in  given  situations. 

5.  Deliberation:  Polarized  vs.  balanced.  This  dimension  indicates  the  extent  to  which  the 
writings  encourage  blind  conviction  of  one’s  beliefs  or  more  objective  thought  processes. 
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Passages  that  encouraged  polarized  beliefs  or  actions  were  coded  as  - 1 ,  and  passages  that 
encouraged  balanced  deliberation  or  action  were  coded  as  +1. 

Polarized 

•  Emphasize  that  one  way  is  absolutely,  100%  correct  or  good 

•  Own  goals  are  paramount,  so  any  actions  necessary  to  achieve  goals  are  reasonable 

•  Accept  any  opinions  that  support  ones’  desired  beliefs/actions 

Balanced 

•  Need  to  consider  all  evidence  and  opinions 

•  Weigh  both  pros  and  cons  of  actions 

6.  Other.  The  excerpt  does  not  fit  clearly  within  any  of  the  above  categories  of  metacognitive 
belief. 

As  described  above,  coders  also  rated  the  valence  of  each  excerpt  assigned  to  a  metacognitive 
category,  as  follows: 

a.  -1  for  maintenance,  homogeny,  separation,  authority,  polarized 

b.  1  for  change,  diversity,  interaction,  independence,  balanced 

We  did  not  assign  a  valence  code  to  statements  coded  as  “F.  Other.” 

Two  raters  independently  coded  the  webpage  excerpts.  Each  unit  was  coded  according  to  one  of 
the  five  metacognitive  ideas  expressed  in  the  text,  as  described  above.  In  addition,  the  coding 
also  captured  the  valence  of  these  metacognitive  ideas.  The  coders’  initial  ratings  were  used  to 
determine  inter-rater  agreement.  They  held  meetings  to  discuss  disagreements,  and  determined 
the  final  set  of  codes  by  consensus.  Cohen’s  Kappa  was  used  to  compute  inter-rater  reliability  for 
the  metacognitive  categories,  as  it  takes  into  account  agreement  by  chance.  The  inter-rater 
reliability  for  metacognitive  categories  for  moderate  text  was  k  =  .54  and  k  =  .43  for  the 
extremist  text.  Both  of  these  numbers  fall  within  the  moderate  range  of  reliability  (Landis  & 
Koch,  1977).  The  inter-rater  reliability  for  valence  was  quite  high  (99%  agreement  for  moderate 
text  and  97%  for  extremist  text). 

Relative  Frequency  of  Metacognitive  Categories.  The  percentage  of  excerpts  assigned  to  each 
of  the  five  metacognitive  categories  is  displayed  in  Figure  2.  As  in  Figure  2,  the  metacognitive 
ideas  included  in  moderate  texts  tended  to  emphasize  knowledge  and  judgment,  followed  by 
information  exchange.  On  the  extremist  side,  information  exchange  was  the  most  prevalent  kind 
of  metacognitive  idea  discussed,  with  knowledge  and  judgment  following.  Coherence  and 
deliberation  were  the  least  frequently  used  categories  for  each,  and  were  essentially  nonexistent 
within  the  moderate  documents. 
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Figure  2.  Percentage  of  excerpts  by  metacognitive  category. 

Valence  of  Each  Category.  Overall,  the  average  valence  for  the  moderate  text  was  .87,  and  the 
average  valence  for  the  extremist  text  was  -.42.  The  variability  in  metacognitive  categories  and 
valence  across  the  moderate  and  extremist  text  was  significantly  greater  than  chance,  %2(  13)  = 
271.61,  p  <  .001.  This  indicates  that  excerpts  from  moderate  websites  tended  to  contain  beliefs 
that  encouraged  cultural  change,  diversity  of  beliefs,  interactions  between  Muslim  and  non- 
Muslim  groups,  independence  in  judgments,  and  balanced  thought  processes.  In  contrast, 
excerpts  from  extremist  websites  provided  more  support  for  knowledge  maintenance,  homogeny 
of  beliefs,  separation  between  groups,  and  obedience  to  judgments  made  by  authorities.  Figure  3 
displays  the  averages  of  the  positively  and  negatively  valenced  metacognitive  codes  for  the 
excerpts  from  moderate  and  extremist  websites. 
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Figure  3.  Mean  valence  by  metacognitive  category  and  document  type. 
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A  2  (valence:  positive  or  negative)  x  2  (website  source:  extremist  or  moderate)  contingency  chi- 
square  was  calculated  for  each  metacognitive  dimension  in  order  to  determine  whether  there  was 
a  significant  difference  in  the  valence  of  metacognitive  beliefs  presented  on  the  moderate  and 
extremist  websites.  These  results  are  described  below. 

Knowledge:  Maintenance  vs.  change.  For  the  knowledge  maintenance  vs.  change  dimension, 
excerpts  from  extremist  websites  (88%)  were  significantly  more  likely  to  support  knowledge 
maintenance  than  were  excerpts  from  moderate  websites  (7%),  x~(  1)  =  71.26,/?  <  .001.  For 
example,  “innovation  is  misguidance”  is  an  excerpt  from  an  extremist  website  that  encourages 
knowledge  maintenance  and  abiding  by  traditions.  In  contrast,  here  is  an  excerpt  from  a 
moderate  website  that  support  knowledge  change:  Religion  becomes  a  dynamic  and  evolving 
concept,  rooted  in  history  but  given  various  interpretations  suited  to  our  times  and  addressing 
our  problems  and  challenges. 

Coherence:  Homogeny  vs.  diversity.  For  the  homogeny  vs.  diversity  dimension,  excerpts  from 
extremist  websites  (80%)  were  significantly  more  likely  to  mention  support  for  homogeny  of 
beliefs  than  were  excerpts  from  moderate  websites  (0%),  ^(1)  =  9.6,/?  =  .002.  Excerpts  from 
extremist  websites  that  supported  homogeny  of  beliefs  included  statements,  such  as: 

‘To  accept  the  idea  of  pluralism  means  that  you  do  not  care  much 
about  religion.  ” 

“ The  ultimate  goal  of  Jihad  in  Islam  is  for  no  religion  to  remain  on  Earth 
except  Islam.  ” 

In  contrast,  the  following  excerpts  from  moderate  websites  accept  and  encourage  diversity: 

“Just  as  men  have  invented  different  languages  to  talk  to  each  other,  so  they 
have  invented  different  religions  to  talk  to  God,  and  God  understands  them  all 
well  enough.  ” 

“In  a  world  as  large  as  Islam,  in  a  history  as  long  as  that  of  Islam,  legitimate 
differences  in  doctrine  and  practice  and  should  be  expected  and  appreciated.  ” 

Information  exchange:  Separation  vs.  interaction  with  other  groups.  For  the  separation  vs. 
interaction  dimension,  excerpts  from  extremist  websites  (64%)  were  significantly  more  likely  to 
support  separation  between  Muslim  and  non-Muslim  groups  than  were  excerpts  from  moderate 
websites  (2%),  y2(  1 )  =  37.67,/?  <  .001.  Here  are  examples  of  excerpts  from  extremist  websites 
that  advocate  separation  between  groups: 

“ Believing  Muslims  must  keep  themselves  highly  separated  from  non-believers.  ” 

“Do  not  pray  after  a  Jew  or  a  Christian,  do  not  eat  slaughtered  meat  from  them, 
do  not  attend  their  funerals,  and  do  not  take  care  of  their  sick.  ” 
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The  excerpts  from  moderate  websites  primarily  encouraged  interaction  between  groups: 

“Allah  does  not  forbid  you  (Muslims)  to  deal  justly  and  kindly  with  those  who 
have  not  fought  against  you  in  accounts  of  your  religion,  ” 

“ The  future  of  Islam  depends  on... not  in  hiding  or  retreating  but  in  reaching  out 
and  working  with  people  of  other  faiths  to  improve  the  life  of  all  Muslims  and 
Non-Muslims  alike.  ” 

Judgment:  Authority  vs.  independence.  For  the  authority  vs.  independence  dimension,  excerpts 
from  extremist  websites  (96%)  were  significantly  more  likely  mention  that  only  judgments 
should  be  made  by  authorities  than  were  excerpts  from  moderate  websites  (8%),  ^(1)  =  68.35,  p 
<  .001.  Extremist  websites  encouraged  people  to  obey  judgments  made  by  authority:  “ Obey 
whoever  is  placed  in  authority  over  you,  ”  and  “the  gate  of  Ijtihad  is  closed.  ”  In  contrast, 
excerpts  from  moderate  websites  tended  to  encourage  people  to  fonn  their  own  judgments:  “It  is 
good  to  discuss  and  debate  religious  ideas  and  interpretations  of  the  Koran  ”  and  “Ijtihad 
(critical  thinking  to  come  up  with  solutions  to  new  problems)  should  always  be  available.  ” 

Deliberation:  Polarization  vs.  balance.  For  the  polarized  vs.  balanced  dimension,  excerpts  from 
extremist  websites  (87%)  and  moderate  websites  (100%)  both  encouraged  balanced  thought 
processes  and  actions.  There  was  not  a  significant  difference  in  the  valence  of  excerpts  from 
moderate  and  extremist  websites  on  this  dimension,  y2(  1 )  =  0.74,/?  =  .389.  While  some  extremist 
websites  supported  polarized  beliefs,  such  as  “ One  of  the  most  eloquent  symptoms  of  the  moral 
bankruptcy  of  Western  culture  is  a  certain  fashionable  attitude  toward  moral  issues,  best 
summarized  as:  ‘There  are  no  blacks  and  whites;  there  are  only  grays.  The  majority  of 
moderate  and  extremist  websites  encouraged  balance  thought  processes:  “ One  needs  to  know  of 
something  called  balance  and  justice... people  should  look  for  the  truth  from  themselves  and  from 
other  than  them.  ” 

Model  of  Metacognitive  Values.  Based  on  the  results  of  Study  1,  we  limited  the  model  of 
metacognitive  values  to  include  the  dimensions  of  knowledge,  judgment,  coherence,  and 
information  exchange.  Beyond  theoretical  appeal,  each  of  these  showed  promising  results  in 
terms  of  exhibiting  distinctive  value  polarities  for  documents  taken  from  extremist  and  moderate 
websites.  Deliberation  was  eliminated  from  the  model,  as  it  was  not  found  to  differ  by  document 
source.  Figure  4  displays  the  final  model. 
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Figure  4.  Four  Dimension  Model  of  Metacognitive  Values  embedded  in  religious  ideology. 

In  Study  2,  we  seek  to  further  confirm  or  disconfirm  this  model  by  using  computational  text 
analysis  methods  to  aid  in  analyzing  the  corpus.  Such  methods  remain  at  an  early  research  phase 
of  development,  and  suffer  from  several  problems.  Most  contemporary  information  extraction 
approaches  for  cultural  modeling,  including  lexical  co-occurrence  methodologies  (Bardi, 
Calogero,  &  Mullen,  2008),  are  limited  to  the  recognition  of  specific  entities  based  on  lexical 
criteria,  such  as  nouns,  names  of  people,  places  and  dates.  These  include  a  range  of  “bottom-up” 
statistical  techniques,  such  as  Latent  Dirichlet  Allocation  (See  Appendix  B  for  more  details  on 
one  such  approach;  Penta,  Shadbolt,  Smart,  &  Sieck,  2011,  November).  Ontologies,  however, 
can  enhance  information  extraction  capabilities  by  supporting  the  identification  of  relations 
between  entities.  In  Study  2,  we  exploited  this  capability,  where  we  describe  and  utilize  ontology 
representations  of  the  metacognitive  ideas  in  the  computational  text  analyses.  The  development 
of  this  ontology  approach  for  automated  measurement  of  cultural  values  represents  a  significant 
contribution  of  the  current  project,  and  so  the  background  is  described  in  detail  prior  to  revisiting 
the  substantive  results  related  to  the  metacognitive  values  model. 
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STUDY  2:  CONFIRMATORY  CULTURAL  ANALYSIS  USING 
ONTOLOGY-BASED  INFORMATION  EXTRACTION 

In  Study  2,  we  focus  on  an  approach  to  knowledge  extraction  that  has  attracted  considerable 
attention  in  the  knowledge  engineering  community.  This  is  the  approach  of  Ontology-Based 
Information  Extraction  (OBIE;  see  Wimalasuriya  &  Dou,  2010).  OBIE  relies  on  the  exploitation 
of  background  knowledge  as  part  of  the  knowledge  extraction  process.  Unlike  bottom-up 
techniques,  it  does  not  strive  to  identify  a  set  of  topics  or  conceptual  categories  from  a  source 
text.  Instead,  it  starts  with  a  set  of  predefined  knowledge  structures  (e.g.,  concepts)  and  attempts 
to  find  instances  of  these  structures  in  the  source  text.  One  of  the  advantages  of  OBIE  is  that  it 
focuses  attention  on  a  core  set  of  entities  that  are  relevant  to  particular  problem-solving  activity. 
Thus,  whereas  bottom-up  techniques  may  yield  outputs  that  have  little  relevance  to  cultural 
analysis,  the  use  of  OBIE  techniques  focuses  attention  on  just  those  knowledge  structures  that 
are  relevant  to  the  interests  and  concerns  of  a  cultural  analyst. 

As  it  name  suggests,  OBIE  requires  the  presence  of  a  domain  ontology  that  can  be  used  by 
machine-based  information  extraction  processors.  The  word  ‘ontology’  has  two  meanings.  In 
philosophy,  the  term  is  used  to  refer  to  the  study  of  the  nature  of  being,  existence,  or  reality. 
Ontology  thus  deals  with  questions  concerning  the  existence  of  entities,  the  similarities  and 
differences  between  entities  and  the  relationships  between  entities.  In  computer  science,  the  word 
‘ontology’  has  a  somewhat  different  meaning.  In  this  case,  it  denotes  an  artifact  that  functions  as 
a  representation  of  the  entities  in  some  domain.  An  ontology  could  thus  be  created  to  represent 
the  entities  in  domains  such  as  biomedicine  or  aircraft  engineering.  In  fact,  it  is  probably  more 
appropriate  to  say  that  the  elements  of  an  ontology  represent  the  human  understanding  of  a 
domain  rather  than  the  entities  within  a  domain.  Thus,  ontologies  serve  as  external 
representations  of  the  concepts  that  humans  entertain  when  talking  or  thinking  about  particular 
things.  This  makes  ontologies  suited  to  represent  the  knowledge  that  humans  have  about  a 
domain.  Additionally,  because  ontologies  typically  avail  themselves  of  logical  formalisms,  they 
are  often  seen  as  a  means  by  which  human  knowledge  and  understanding  can  be  represented  in  a 
machine-accessible  format. 

A  domain  ontology  thus  provides  a  fonnal,  machine-readable  representation  of  domain 
knowledge  in  some  target  domain,  and  it  therefore  supports  the  exploitation  of  domain 
knowledge  as  part  of  some  automated  process  (in  the  case  of  OBIE  the  automated  process  is,  of 
course,  the  detection,  identification  and  extraction  of  specific  instances  of  the  elements  defined 
in  the  ontology).  The  development  of  a  domain  ontology  is  thus  a  crucial  part  of  the  OBIE 
process.  In  the  following  sections,  we  first  present  some  background  on  tools  and  techniques  that 
we  reviewed  to  support  OBIE.  Then,  we  provide  an  overview  of  the  ontology  that  was  developed 
to  support  the  OBIE  process  employed  in  Study  2,  which  is  called  the  IEXTREME  Ontology. 

General  Approach 

OBIE  Background.  The  development  of  a  Web-based  knowledge  extraction  system  is  based 
on  a  decade  of  research  into  Ontology-Based  Information  Extraction  (OBIE)  systems  (see 
Wimalasuriya  &  Dou,  2010).  This  section  describes  the  supporting  tools  and  techniques  we 
reviewed  for  possible  adaptation  or  extension  for  use  in  Study  2. 
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The  first  tool  considered,  MnM,  (Vargas-Vera  et  al.,  2002)  is  an  annotation  tool  which  provides 
both  automated  and  semi-automated  support  for  annotating  web  pages  with  semantic  contents. 
MnM  integrates  a  web  browser  with  an  ontology  editor  and  provides  open  APIs  to  link  to 
ontology  servers  and  for  integrating  information  extraction  tools.  There  are  five  main  activities 
supported  by  the  MnM  tool.  These  include: 

•  Browsing:  a  specific  set  of  knowledge  elements  is  chosen  from  a  library  of 
knowledge  models  on  an  ontology  server 

•  Mark-up:  a  corpus  of  documents  are  manually  marked  up  using  the  selected 
knowledge  elements 

•  Learning:  a  learning  algorithm  is  run  over  the  marked  up  corpus  to  leam  the 
extraction  rules 

•  Testing:  the  IE  mechanism  is  run  over  the  text  corpus  to  assess  its  precision 
and  recall  measures 

•  Extraction:  an  IE  mechanism  is  selected  and  run  over  a  set  of  documents 

A  second  important  tool,  OntoMat,  is  a  user-friendly  interactive  Web  page  annotation  tool.  It 
supports  the  user  in  creating  and  maintaining  ontology-based  semantic  annotations.  It  includes  an 
ontology  browser  for  the  exploration  of  the  ontology  and  instances  and  a  HTML  browser  that 
will  display  the  annotated  parts  of  the  text.  Melita* 3 4  is  an  ontology-based  text  annotation  tool. 
There  are  two  frames  in  the  interface.  The  left  frame  is  the  ontology  representing  the  annotations 
that  can  be  inserted.  A  specific  color  is  associated  to  each  node  in  the  ontology.  The  document  to 
be  annotated  can  be  loaded  in  the  right  frame.  Text  can  be  annotated  by  selecting  text  with  the 
mouse,  and  then  clicking  on  the  appropriate  node  in  the  ontology  pane. 

SemanticWord5  is  an  environment  based  in  Microsoft  Word  that  integrates  a  variety  of  semantic 
annotation  capabilities.  There  are  two  types  of  annotations  in  SemanticWord:  instance  references 
and  triple  bags.  An  instance  reference  associates  a  text  region  with  an  instance  of  a  class.  Triple 
bags  describe  the  content  of  a  text  region  with  a  collection  of  triples.  In  this  case,  the  subject  of 
the  triple  is  an  instance,  the  predicate  of  the  triple  is  a  property  defined  in  an  ontology,  and  the 
object  of  the  triple  can  be  either  an  instance  or  a  value.  The  Choosers  feature  (see  Figure  5)  of 
SemanticWord  enables  triple  cells  to  be  filled  by  dragging  and  dropping  instance  references  and 
by  picking  instances  and  properties  from  the  Choosers  interface.  Choosers  can  use  the  values 
already  stored  in  a  triple  to  constrain  the  lists  of  choices  offered  to  the  user. 


"  http :  / /kmi .  op  en .  ac .  uk/proj  ec  ts/ akt/MnM/ 

3  http://annotation.semanticweb.org/ontomat/ 

4  http://www.dcs.shef.ac.uk/~alexiei/WebSite/University/Melita/index.html 

5  http://mr.teknowledge.com/DAML 
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Property  Chooser  Instance  Chooser 


Figure  5.  SemanticWord  chooser  panes. 

SemanticWord  also  provides  Personal  Class  Toolbars  for  generating  both  content  and 
annotations  together  with  just  one  mouse  click.  Users  can  create  any  number  of  Personal  Class 
Toolbars,  each  one  of  them  tied  to  a  single  class.  Each  personalized  class  toolbar  contains  an 
instance  selection  combo  box  and  buttons  to  create  instance  references  corresponding  to  the 
selected  instance  or  a  new  one.  A  class  cascading  menu  includes  an  entry  for  every  named  class 
in  the  ontology  attached  to  the  document.  This  menu  gives  users  access  to  most  of  the  operations 
related  to  ontology  classes,  including  defining  new  instances,  creating  personal  class  toolbars, 
and  opening  instance  choosers.  When  a  user  executes  any  of  these  functions  from  this  menu,  the 
menu  entry  corresponding  to  the  selected  class  is  duplicated  and  placed  at  the  top  of  the  menu  so 
the  user  can  access  it  easily  the  next  time  that  they  needs  it.  Finally,  SemanticWord  integrates  an 
information  extraction  system.  As  the  user  types  the  content  of  the  document,  a  background 
thread  feeds  new  or  modified  text  to  the  information  extraction  system  in  paragraph  units 
(roughly),  obtains  the  extracted  entities  with  their  position  in  the  text,  and  underlines  those  text 
regions  with  a  blue  wiggly  line.  This  procedure  resembles  Word  spelling  and  grammar  checking. 
The  user  can  examine  the  extracted  entities  and  convert  them  into  instance  reference  annotations. 

Semantic  mark-up  is  a  plug-in  to  Internet  Explorer  that  supports  semantic  annotation.  When  a 
page  is  loaded  into  the  Web  browser,  the  plug-in  scans  the  page  to  see  if  it  contains  any  existing 
semantic  mark-up.  If  so,  the  plug-in  identifies  the  DAML+OIL  types,  properties,  and  instances 
used  in  that  semantic  mark-up  and  it  displays  a  toolbar  containing  buttons  corresponding  to  these 
elements.  Any  semantic  annotations  found  within  a  page  are  displayed  on  the  same  page  as  a 
Semantic  Mark-up  Table.  This  shows  either  the  single  triple  containing  the  selected  concept  or 
all  the  triples  contained  in  the  page. 

Amilcare  is  an  adaptive  infonnation  extraction  (IE)  system  designed  to  support  document 
annotation  within  the  framework  of  the  semantic  web  (Ciravegna  &  Wilks,  2003).  It 
was  developed  in  the  context  of  the  AKT  initiative  and  is  based  on  ML  technology  that 
generates  rules  for  semantic  mark-up  of  both  unstructured  (e.g.,  free  text)  and  structured 
(e.g.,  XML/HTML)  text  sources.  Amilcare  uses  ML  technology  to  generate  rules,  which  are 
learned  by  generalizing  over  a  set  of  examples  found  in  a  training  corpus  annotated  with  XML 
elements.  Amilcare’s  ML  technology  produces  two  types  of  rules: 
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1 .  Rules  that  insert  semantic  annotations  into  the  text,  and 

2.  Rules  that  correct  mistakes  and  imprecision  in  the  annotations  provided  by  (1). 

A  tagging  rule  is  composed  of  a  left-hand  side  containing  a  pattern  of  conditions  on  a  connected 
sequence  of  words,  and  a  right  hand  side  that  is  an  action  inserting  an  XML  element  into  the  text 
corpus.  Amilcare’s  default  architecture  includes  the  connection  with  Annie,  GATE’S6  shallow  IE 
system,  which  performs  tokenization,  part  of  speech  tagging,  gazetteer  lookup  and  named  entity 
recognition.  The  architecture,  however,  is  flexible  enough  to  allow  any  other  pre-processor  to 
connect  via  the  API.  The  preprocessor  is  also  the  only  language-dependent  module,  the  rest  of 
the  system  being  language  independent  (experiments  were  performed  in  English  and  Italian). 
Amilcare  thus  provides  a  platform  for  automated  semantic  annotation  that  is  sufficiently  flexible 
to  deal  with  both  free  text  documents  and  highly  structured  web  resources  (e.g.,  web  pages, 

XML  files). 

Amilcare  works  in  three  modes:  training  mode,  test  mode,  and  production  mode.  In  training 
mode,  Amilcare  can  be  used  to  learn  semantic  annotation  rules  that  provide  the  basis  for 
information  extraction.  Amilcare  is  based  on  the  Learning  Patterns  via  Language  Processing 
(LP2)  algorithm  (Ciravegna,  2001),  which  is  a  supervised  algorithm  that  falls  into  a  class  of 
Wrapper  Induction  Systems  using  LazyNLP.  LP2  induces  symbolic  rules  that  insert  SGML  tags 
into  text  by  learning  from  examples  in  a  user-defined  tagged  corpus.  Induction  is  perfonned  by 
generalizing  from  the  examples  in  the  corpus.  As  part  of  the  generalization  process,  LP2  uses 
generic  shallow  knowledge  about  natural  language  as  provided  by  a  morphological  analyzer,  a 
part-of-speech  tagger  and  a  gazetteer.  The  rules  generated  by  LP2  consist  of  a  left-hand  side, 
containing  a  pattern  of  conditions  on  a  connected  sequence  of  words,  and  a  right-hand  side, 
which  inserts  an  SGML  tag  into  the  source  text. 

Once  the  annotation  rules  have  been  learned,  Amilcare  can  be  used  in  a  test  mode.  Amilcare  is 
used  to  test  the  induced  rules  on  an  unseen  tagged  corpus.  By  applying  the  learned  rules  in  this 
mode,  the  analyst  is  able  to  evaluate  how  well  the  system  performs.  When  running  in  test  mode 
Amilcare  first  removes  all  the  annotations  from  the  corpus,  then  re-annotates  the  corpus  using 
the  induced  rules.  Finally,  the  results  are  automatically  compared  with  the  original  annotations 
and  the  results  are  presented  to  the  user. 

The  production  mode  is  used  when  rule  learning  and  evaluation  are  complete.  It  annotates  new 
documents  using  the  available  rules  and  makes  the  annotated  documents  available  for  subsequent 
processing  steps,  e.g.,  knowledge  extraction  and  knowledge  consolidation.  Amilcare  has  been 
integrated  into  a  number  of  other  semantic  annotation  systems,  including,  MnM,  OntoMat  and 
Melita,  as  described  above. 

IE  systems  such  as  Amilcare  can  be  used  to  enhance  the  recognition  of  entities  in  a  document — 
for  example,  that  ‘Rembrandt’  is  a  person;  however,  such  information  is  not  very  useful  without 
the  ability  to  derive  the  relationships  between  those  entities,  e.g.,  that  Rembrandt  was  bom  on  a 
certain  date  and  is  the  creator  of  particular  artworks.  Extracting  such  relations  automatically 
allows  us  to  capture  a  more  complete  knowledge  to  populate  the  ontology  and  is  essential  in 
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information  fusion  contexts  where  relevant  information  is  dispersed  across  multiple  source 
documents.  In  order  to  capture  relational  information,  we  exploited  the  successes  of  work  within 
the  AKT  initiative,  particularly  with  respect  to  systems  like  Artequakt  (Alani,  Kim,  Millard, 
Weal,  Hall,  Lewis,  et  ah,  2003a),  which  attempts  to  identify  entity  relationships  using  ontology 
relation  declarations  and  lexical  information  (“The  Protege  Axiom  Language”  2005). 

Artequakt  (Alani  et  ah,  2003a;  Alani  et  ah,  2003b;  Alani  et  ah,  2003c)  is  a  system  for  ontology- 
based  knowledge  extraction  that  was  developed  at  the  University  of  Southampton  as  part  of  the 
AKT  initiative  (Shadbolt  et  ah,  2004).  It  combines  the  use  of  a  domain  ontology  with  a  number 
of  supporting  tools  and  resource,  including  the  general-purpose  lexical  database  WordNet  and 
the  GATE  NLP  system.  In  this  case,  GATE  is  used  as  a  syntactical  pattern-matching  entity 
recognizer,  and  WordNet  is  used  as  a  supplementary  information  source  in  order  to  identify 
entities  that  are  not  recognized  by  the  GATE  system.  The  aim  in  Artequakt  is  to  identify  and 
extract  knowledge  fragments  from  a  set  of  Web-based  textual  resources,  using  both  a  domain 
ontology  and  linguistic  resources  as  background  knowledge  for  the  extraction  process.  The 
specific  steps  involved  in  the  knowledge  extraction  process  are  detailed  below,  and  Figure  6 
illustrates  the  overall  process. 


Figure  6.  Artequakt’s  knowledge  extraction  process. 

1.  Syntactic  Analysis:  Each  Web-based  resource  (i.e.,  HTML  document)  is  first  subjected 
to  syntactic  analysis  using  the  Apple  Pie  Parser.  The  Apple  Pie  Parse  is  a  bottom-up 
probabilistic  chart  parser  derived  from  the  Penn  Tree  Bank  syntactically-tagged  corpus, 
and  it  is  used  in  the  Artequakt  system  to  generate  syntactic  annotations  of  the  source  text, 
For  example,  the  parser  is  used  to  identify  that  ‘Renoir’  is  a  noun,  while  ‘was’  is  a  verb. 
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2.  Sentence  Decomposition:  The  sentences  in  the  Web  document  are  then  analyzed  and 
complex/compound  sentences  are  converted  into  simple  sentences.  This  step  is  important 
because  relation  extraction  in  Artequakt  centers  on  the  analysis  of  individual  sentences. 

3.  (Named)  Entity  Recognition:  GATE  and  WordNet  are  then  used  to  identify  the  named 
entities  mentioned  in  the  text.  For  example,  the  system  highlights  the  fact  that  ‘Renoir’  is 
a  Person  and  ‘Limoges’  is  a  location).  This  step  is  not  necessarily  independent  of  the 
aforementioned  syntactic  analysis  step.  In  some  case,  it  may  be  important  to  know  that  a 
specific  element  corresponds  to  a  noun  before  it  can  be  identified  as  a  particular  type  of 
entity. 

4.  Resolution  of  Anaphoric  References:  GATE  is  used  to  resolve  anaphoric  references  of 
singular  personal  pronouns.  For  example,  a  sentence  beginning  ‘He  was  bom...’  may  be 
replaced  with  a  sentence  beginning  ‘Pierre- Auguste  Renoir  was  bom  on...” 

5.  Addition  of  Missing  Subjects:  In  some  cases,  the  subject  for  a  particular  sentence  may 
be  missing.  This  step  seeks  to  identify  and  add  missing  subjects  where  appropriate. 

6.  Concept  Extraction:  The  next  step  is  to  establish  a  mapping  between  the  elements  of  the 
domain  ontology  and  the  source  text.  For  example,  in  the  case  of  a  sentence  that  features 
as  its  subject  the  artist  ‘Renoir,’  we  want  to  establish  a  mapping  between  the  text 
fragment  denoting  the  artist  ‘Renoir’  and  the  ontology  element  (instance)  denoting  the 
same  artists.  To  accomplish  this,  Artequakt  engages  in  an  ontology  mapping  process  that 
takes  annotated  elements  of  the  source  text  document  and  maps  them  to  elements  of  the 
domain  ontology.  In  many  cases,  of  course,  this  process  is  complicated  by  the  fact  that 
the  labels  assigned  to  ontology  elements  seldom  match  the  corresponding  text  for  a 
particular  entity.  As  such,  Artequakt  uses  a  term  expansion  technique  that  uses 
WordNet-based  lexical  chains  to  expand  entity  names  in  order  to  increase  the  chances  of 
finding  a  match.  For  example,  GATE  identifies  ‘Museum  of  Art’  as  an  ‘Organization’, 
while  the  Artequakt  ontology  defines  ‘Legal  Body’  as  a  general  concept  for 
organizations.  In  order  to  work  out  that,  with  respect  to  the  Artequakt  ontology, 

‘Museum  of  Art’  is  a  ‘Legal  Body’,  Artequakt  expands  the  lexical  chains  associated  with 
‘Organization’  in  WordNet  and  compares  these  with  the  labels  assigned  to  elements  in 
the  Artequakt  ontology. 

7.  Relation  Extraction:  The  final  step  in  the  knowledge  extraction  process  is  relation 
extraction.  Relation  extraction  in  Artequakt  centers  on  individual  sentences.  The  aim  is  to 
extract  relationships  between  a  pair  of  entities  within  a  given  sentence.  Relations  are 
extracted  by  matching  the  verb  and  entity  pairs  found  in  each  sentence  with  relations  and 
concepts  pairs  as  asserted  in  the  domain  ontology.  As  was  the  case  with  the  concept 
extraction  step,  a  number  of  lexical  chains  (synonyms,  hypernyms  and  hyponyms)  from 
WordNet  are  used  to  support  the  mapping  between  a  text  fragment  (usually 
corresponding  to  a  verb)  [(e.g.,  ‘born’  and  an  ontology  relation  (e.g.,  ‘dateofbirth’)]. 
For  the  sentence  in  Figure  6,  the  main  verb,  ‘bom’  matches  with  two  relations  in  the 
ontology:  ‘date  of  birth’  and  ‘place  of  birth.’  The  choice  between  these  two  relations  in 
a  given  sentential  context  depends  on  the  range  of  the  relation  in  the  ontology  and  the 
recognized  type  of  the  verb  object  in  the  sentence.  Thus,  ‘date  of  birth’  is  selected  when 
the  sentence  object  is  ‘February  25,  1841’  because  this  is  recognized  as  a  type  of  ‘Date’, 
and  the  Artequakt  ontology  specifies  that  the  range  of  the  ‘dateofbirth’  relation  is  of 
type  ‘Date.’  Similarly,  when  the  object  of  the  verb  is  ‘Limoges’,  then  Artequakt  selects 
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the  ‘placeofbirth’  relation.  This  is  because  the  range  of  the  ‘placeofbirth’  relation  is 
of  type  ‘Place’,  and  ‘Limoges’  is  correctly  recognized  as  a  type  of ‘Place.’ 

One  of  the  relatively  unique  features  of  the  Artequakt  system  relates  to  its  attempt  to  extract 
relational  information  from  unstructured  textual  resources.  Thus,  while  most  knowledge 
extraction  systems  focus  on  the  identification  of  entities  in  a  source  text,  Artequakt  attempts  to 
identify  the  relations  that  link  various  entities  together  in  a  coherent  semantic  network.  This  is  an 
important  aim  because  much  of  the  knowledge  provided  by  a  resource  is  often  contained  in  the 
relationships  between  the  entities  mentioned  in  a  text.  While  it  might  be  important  to  recognize 
that  specific  entities,  such  as  a  person  and  an  organization,  are  mentioned  in  a  text,  what  is  often 
more  important  is  the  specific  relationship  between  these  entities  (such  as  the  fact  that  the  person 
is  a  member  of  the  organization).  Artequakt  attempts  to  identify  these  relationships  by  using  the 
ontology  as  a  background  knowledge  resource.  In  particular,  the  structure  of  the  ontology 
enables  the  Artequakt  system  to  establish  expectations  about  the  likely  set  of  relationships  that 
might  exist  between  any  two  entities,  and  these  expectations  can  guide  the  analysis  of  relation¬ 
relevant  linguistic  content.  Because  this  capability  provided  important  background  to  the 
technical  approach  adopted  within  Study  2,  it  helps  to  focus  on  a  concrete  example  of  how  the 
Artequakt  approach  to  relational  knowledge  extraction  extends  to  application  in  a  cultural 
analysis  context. 

Imagine,  for  the  sake  of  argument  that  we  wish  to  compile  a  knowledge  base  consisting  of 
information  about  the  various  groups,  organizations  and  social  actors  in  Afghanistan.  Consider 
the  sentence  ‘The  president  of  Afghanistan  is  Hamid  Karzi.’  Following  the  initial  deployment  of 
the  aforementioned  semantic  annotation  capability,  the  sentence  is  annotated  as  followed: 

The  president  of  <country>Afghanistan</country>  is  <person>Hamid  Karzi</person>. 

The  annotations  here  (represented  as  XML  elements)  indicate  that  the  text  fragments 
‘Afghanistan’  and  ‘Hamid  Karzi’  have  been  identified  as  instances  of  the  ‘Location’  and 
‘Person’  classes,  respectively  (both  of  these  classes  are  defined  in  the  ontology  that  was  used  to 
train  the  semantic  annotation  subsystem).  Once  the  entity  annotations  have  been  identified  and 
asserted  into  the  textual  resource,  the  relation  extraction  component  subsystem  is  provided  with  a 
much  richer  resource  than  would  otherwise  have  been  the  case  against  which  ontology-based 
relation  extraction  can  take  place.  Importantly,  it  is  only  once  such  annotations  are  in  place,  that 
the  real  value  of  the  ontology  (for  the  detection  and  extraction  of  relationships)  can  be 
appreciated.  Thus,  if  we  imagine  that  the  ontology  containing  the  aforementioned  ‘Person’  and 
‘Country’  classes  also  asserts  an  ‘isPresidentOf  relationship  between  these  classes,7  then  a 
relation  extraction  system  can  establish  an  expectation  or  prediction  about  the  nature  of  the 
relationship  between  the  person  and  country  instances  identified  in  the  above  sentence.  When 
this  expectation  is  combined  with  additional  further  natural  language  processing  of  the  sentence, 
the  system  will  be  able  to  assert  the  appropriate  semantic  annotation: 

The  <isPresidentOf>president  of<isPresidentOf>  <country>Afghanistan</country>  is 
<person>Hamid  Karzi</person>. 


7This  would  be  represented  by  asserted  the  ‘Person’  class  as  the  domain  of  the  ‘isPresidentOf  relationship  and  the 
‘Country’  class  as  the  range  of  the  relationship. 
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Obviously,  the  nature  of  the  natural  language  processing  that  performed  on  the  sentence  is 
critical  to  this  relation-based  annotation  capability.  It  is  not  sufficient  for  a  system  to  fonn  simply 
an  expectation  about  the  kind  of  relationships  that  might  occur  between  identified  entities;  the 
system  also  needs  to  ascertain  whether  the  sentential  context  is  correct.  Clearly,  the  assertion  of  a 
particular  relationship  will  only  be  appropriate  in  certain  sentential  contexts.  Thus,  the 
‘isPresidentOf  relationship  would  not  be  appropriate  if  the  target  sentence  was  ‘Hamid  Karzi 
lives  in  Afghanistan.’  In  this  case,  another  type  of  relationship,  such  as  ‘isInhabitantOf  would  be 
more  appropriate.  The  decision  concerning  which  relationship  (if  any)  to  assert  in  a  particular 
sentential  context  is  based  on  a  strategy  similar  to  that  used  in  previous  research,  most  notably 
the  Artequakt  project  (Alani  et  ah,  2003a).  Essentially,  each  relationship  in  the  ontology  is 
associated  with  a  ‘synset’  in  the  general-purpose  lexical  database  WordNet  (Miller,  Beckwith, 
Fellbaum,  Gross,  &  Miller,  2004).  When  the  relation  extraction  system  executes,  it  attempts  to 
match  the  words  in  a  sentence  against  the  WordNet-based  linguistic  grounding  provided  for  each 
expected  relationship.  In  addition  to  representing  information  about  synonyms,  the  WordNet 
database  also  represents  hypemymy  and  hyponymy  relationships.  Using  the  WordNet  database 
can  support  the  matching  process  by  avoiding  problems  due  to  transliteration. 

Relation  extraction  is  widely  recognized  as  a  much  harder  problem  than  entity  extraction.  The 
current  state-of-the-art  in  terms  of  precision  for  relation  extraction  is  about  70%,  whereas  in  the 
case  of  entity  extraction  it  is  somewhere  in  the  region  of  90%,  at  least  in  restricted  knowledge 
domains  (Sarawagi,  2008).  In  the  case  of  the  Artequakt  system,  Alani  et  al.  (2003c)  reported 
average  performance  metrics  of  85%  for  precision  and  42%  for  recall  for  10  artist  relations. 

These  results  were  obtained  from  an  empirical  study  that  applied  the  Artequakt  knowledge 
extraction  system  to  50  Web  pages  in  order  to  extract  knowledge  about  5  artists.  This  study 
resulted  in  the  generation  of  3,000  unique  RDF  triples,  each  specifying  factual  information  about 
the  artists  in  the  study.  However,  the  number  of  triples  actually  identified  by  human  knowledge 
engineers  across  the  50  source  documents  was  6,071.  Given  that  recall  is  the  number  of  correct 
answers  the  system  returns  relative  to  the  total  number  of  correct  answers  that  could  have  been 
returned,  we  can  infer  that  the  number  of  correct  triples  asserted  by  Artequakt  in  this  study  was 
2,550  (i.e.,  42%  of  6071).  Since  precision  is  the  number  of  correct  answers  a  system  returns 
relative  to  the  total  number  of  answers  actually  returned,  we  can  infer  that  450  (15%  of  3,000) 
triples  returned  by  the  Artequakt  system  were  incorrect. 

The  base  of  past  research  into  OBIE  supporting  tools,  techniques  and  processes  that  we  reviewed 
was  instrumental  for  the  development  of  an  OBIE  process  to  support  cultural  analysis. 

OBIE  Process  for  Cultural  Analysis.  In  Study  2,  our  approach  was  to  use  cultural  ontologies  to 
support  the  extraction  of  culture-relevant  information  from  the  Web.  As  is  common  with  many 
OBIE  systems,  the  information  extraction  process  used  in  Study  2  includes  a  combination  of 
linguistic  infonnation  and  machine  learning  techniques  in  order  to  recognize  and  extract 
information  content.  A  description  of  the  approach  to  knowledge  extraction  in  Study  2  is 
presented  in  Smart,  Sieck,  &  Shadbolt  (2011;  see  Appendix  C).  Step  1  in  the  process  is  to 
develop  an  initial  qualitative  cultural  model  using  a  limited  set  of  knowledge  sources.  Step  2 
involves  the  development  of  a  cultural  ontology  using  the  qualitative  cultural  model  as  a 
reference  point.  This  ontology  is  represented  using  the  Web  Ontology  Language  (OWL),  which 
has  emerged  as  a  de  facto  standard  for  formal  knowledge  representation  on  the  WWW.  Step  3  is 
to  annotate  sample  texts  manually  using  the  cultural  ontology  in  order  to  provide  a  training 
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corpus  for  rule  learning.  In  the  current  context,  the  LP2  algorithm  mediates  rule  learning,  which 
is  a  supervised  algorithm  that  has  been  used  to  develop  a  variety  of  adaptive  information 
extraction  and  semantic  annotation  capabilities.  Following  the  development  of  information 
extraction  rules,  Step  4  is  to  apply  the  rules  to  Web  resources  in  order  to  identify  instances  of  the 
entities  defined  in  the  initial  qualitative  cultural  model.  Step  5  consists  of  the  identification  and 
extraction  of  causal  relations.  The  extraction  of  causal  relationships  is  a  difficult  challenge 
because  techniques  for  information  extraction  have  tended  to  focus  on  the  extraction  of  particular 
entities  in  a  text,  rather  than  the  relationships  between  those  entities.  We  attempt  to  extract  causal 
relationships  using  an  approach  that  combines  the  use  of  background  knowledge  in  the  form  of  a 
domain  ontology  with  the  general  purpose  lexical  database,  WordNet.  Finally,  in  Step  6,  the 
extracted  cultural  knowledge  is  integrated,  stored,  and  used  to  estimate  the  relative  frequencies 
of  the  various  ideas  presented  in  the  initial  qualitative  cultural  model. 

OBIE  requires  the  presence  of  a  domain  ontology  that  machine-based  information  extraction 
processors  can  use.  A  domain  ontology  provides  a  fonnal,  machine -readable  representation  of 
domain  knowledge  in  some  target  domain,  and  it  therefore  supports  the  exploitation  of  domain 
knowledge  as  part  of  some  automated  process  (in  the  case  of  OBIE  the  automated  process  is,  of 
course,  the  detection,  identification  and  extraction  of  specific  instances  of  the  elements  defined 
in  the  ontology).  The  development  of  a  domain  ontology,  which  we  refer  to  as  the  “IEXTREME 
Ontology,”  was  thus  a  crucial  part  of  the  OBIE  process  adopted  in  Study  2,  as  described  next. 

IEXTREME  Ontology.  Ontologies  can  be  created  using  a  variety  of  different  languages; 
however,  the  language  that  is  the  most  widely  used  at  the  present  time  is  the  Web  Ontology 
Language  (OWL;  Antoniou  &  van  Harmelen,  2004).  OWL  emerged  as  a  language  for 
knowledge  representation  on  the  World  Wide  Web  in  2004,  and  the  World  Wide  Web 
Consortium  (W3C)  has  since  sanctioned  its  use.  OWL  was  selected  as  the  ontology 
representation  language  of  choice  in  Study  2  for  a  number  of  reasons.  First,  it  ensures 
compatibility  with  other  ontology  engineering  efforts  that  have  been  undertaken  in  other  areas. 
Second,  OWL  is  built  on  top  of  other  technologies,  such  as  RDF  (McBride,  2004),  which  have 
become  de  facto  standards  for  data  representation  on  the  WWW.  Third,  due  to  its  popularity, 
there  is  extensive  support  for  OWL  in  terms  of  ontology  development  and  use.  Fourth,  OWL 
avails  itself  of  formalisms  that  support  machine  reasoning.  Subsumption  reasoning,  for  example, 
enables  a  reasoner  to  compute  automatically  the  taxonomic  hierarchy  for  a  set  of  objects  in  the 
absence  of  an  explicit  specification  of  subsumption  relationships.  Reasoning  is  of  particular  use 
when  it  comes  to  developing  an  ontology;  for  example,  it  enables  an  ontology  engineer  to 
delegate  much  of  the  modeling  activity  to  the  reasoner.  Based  on  an  initial  characterization  of 
ontology  elements  we  can  rely  on  the  reasoner  to  infer  a  lot  of  the  structural  detail  relating  to  the 
model.  This  process  is  typically  referred  to  as  ‘ontology  normalization’  and  it  is  an  essential 
ingredient  of  engineering  large-scale  ontologies.  Another  reason  why  reasoning  is  important  to 
ontology  development  concerns  the  support  for  logical  consistency  checking.  Modeling  the 
knowledge  infrastructure  of  a  domain  is  a  difficult  process,  typically  the  preserve  of  experienced 
knowledge  engineers.  Fortunately,  OWL  is  amenable  to  a  variety  of  logical  consistency  checks 
that  ensure  the  logical  integrity  of  the  model,  and  these  can  help  avoid  common  modeling 
mistakes. 

As  with  all  OWL  ontologies,  the  IEXTREME  ontology  consists  of  three  types  of  high-level 
knowledge  structure,  namely  classes,  properties  and  individuals.  Classes  represent  sets  of 
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individuals  that  belong  together  because  they  share  common  features  or  properties.  Classes 
broadly  correspond  to  the  notion  of  a  concept  in  psychology.  Properties  represent  relationships 
between  two  individuals  or  between  an  individual  and  a  data  value.  Some  examples  of  properties 
are  hasChild  and  hasAge.  The  first  of  these  properties  can  be  used  to  relate  individuals  where  a 
parental  relationship  exists  between  the  individuals.  It  is  an  example  of  an  object  property  in 
OWL.  The  second  property  can  be  used  to  specify  the  age  of  a  given  individual.  This  is  example 
of  a  datatype  property  in  OWL.  Datatype  properties  are  used  to  link  individuals  to  specific  data 
values.  The  third  type  of  high-level  knowledge  structure  in  OWL  is  the  individual.  Individuals 
correspond  to  specific  instances  of  the  classes  defined  in  the  ontology.  They  represent  specific 
objects  in  the  domain  of  the  ontology. 

In  order  to  create  the  IEXTREME  ontology,  we  first  analyzed  a  set  of  sentences  that  contained 
lexicalizations  of  metacognitive  values.  In  fact,  these  same  sentences  were  used  to  derive  the 
taxonomy  of  metacognitive  values.  Some  example  of  these  sentences,  along  with  the  concepts 
that  eventually  appeared  in  the  ontology  are  presented  in  Table  1. 

Table  1.  Examples  of  the  Source  Sentences  from  which  Elements  of  the  IEXTREME  Ontology 
were  Derived 


Sentence 

Ontology  Elements 

the  ultimate  goal  of  Jihad  in  Islam  is  for  no 
religioii  to  remain  on  Earth  except  |slam; 

Jihad;  Islam;  Goal;  Religion 

development  of  all  kinds  of  knowledge, 
scientific  or  otherwise,  in  the  Muslim  world; 

MuslimWorld;  BodyOfKnowledge;  Muslim; 

ScientificBodyOfKnowledge; 

KnowledgeOrUnderstanding 

Many  verses  of  the  Quran  asks  us  to  study, 
ask  questions,  think  critically,  reflect  and 
consider; 

Koran;  ReligiousText; 

ThinkingOrReasoning; 
CriticalThinkingOrReasoning;  Learning 

the  prophets  said  to  the  unbelieving  infidels: 
Serve  Allah,  you  have  no  other  god  but  Him; 

Infidel;  Allah;  God;  Prophet;  Disbeliever; 
Atheist 

The  aim  of  the  sentence  analysis  stage  was  to  derive  a  list  of  terms  that  could  be  used  as  the  basis 
for  ontology  construction.  In  most  cases,  the  terms  were  used  to  create  classes  that  were 
subsequently  organized  in  a  taxonomic  hierarchy  using  the  Protege  knowledge  editor.  To  support 
the  process  of  sentence  analysis,  we  created  a  tool  that  enabled  us  to  highlight  subsections  of 
each  sentence  and  link  these  subsections  to  specific  elements  (classes,  properties,  and 
individuals)  of  the  emerging  ontology.  Figure  7  shows  a  screenshot  of  this  tool.  The  tool  proved 
to  be  invaluable  in  terms  of  indicating  why  particular  ontology  elements  had  been  included  in  the 
ontology.  The  tool  also  proved  useful  in  terms  of  supporting  an  analysis  of  how  particular  kinds 
of  ontology  elements  were  distributed  across  various  types  of  source  sentence. 
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Figure  7.  Screenshot  of  the  tool  used  for  analysis  of  source  sentences. 

After  an  initial  class  hierarchy  had  been  created  the  ontology  was  progressively  refined  and 
enriched  via  the  addition  of  elements  and  axioms.  In  most  cases,  this  was  achieved  by  referring 
to  online  resources  such  as  Wikipedia.  The  result  of  this  iterative  process  of  sentence  analysis, 
ontology  manipulation,  and  knowledge  capture  from  online  resources  was  the  IEXTREME 
ontology.  Figure  8  shows  the  ontology  as  it  appears  in  the  Protege  knowledge  editing  tool. 
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Figure  8.  Screenshot  of  the  Protege  knowledge  editor  showing  the  IEXTREME  ontology. 

Ontology  Metrics.  A  number  of  metrics  can  be  specified  for  an  ontology,  such  as  the  number  of 
classes,  properties  and  individuals  that  are  defined.  In  its  basic  form,  the  IEXTREME  ontology 
consists  of  the  following: 

•  Classes:  741 

•  Properties:  41 

•  Individuals:  2 1 

•  Triples8: 3044 

However,  other  forms  of  the  IEXTREME  ontology  feature  different  numbers  of  elements, 
specifically  individuals  and  triples.  The  basic  form  of  the  IEXTREME  ontology  consists  of  the 
classes,  properties,  and  individuals  that  were  present  at  the  conclusion  of  the  ontology 


8  A  triple,  in  this  case,  refers  to  an  RDF  triple,  which  is  an  expression  of  a  specific  knowledge  statement  in  OWL. 
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development  process;  it  does  not  include  elements  that  a  reasoner  could  infer,  and  which  were 
implicitly  specified  in  the  ontology.  Alternative  forms  of  the  IEXTREME  ontology  include 
elements  that  have  been  inferred  by  a  reasoner,  or  which  have  been  automatically  asserted  based 
on  specific  uses  of  the  ontology.  In  the  latter  case,  for  example,  the  IEXTREME  ontology  was 
used  to  annotate  a  corpus  of  the  sentences  from  the  collection  described  above.  Since  the 
sentences  were  already  grouped  into  categories  based  on  whether  they  expressed  one  or  other 
metacognitive  values,  it  was  hoped  that  an  ontology-based  annotation  of  the  sentences  might 
reveal  something  about  the  differential  frequency  of  occurrence  of  specific  concepts  in  textual 
sources  expressing  specific  metacognitive  values.  For  the  purposes  of  comparison  with  the  basic 
fonn  of  the  IEXTREME  ontology,  the  version  of  the  ontology  used  to  support  this  analysis 
consisted  of  the  following  elements: 

•  Classes:  741 

•  Properties:  41 

•  Individuals:  2,144 

•  Triples:  12,951 

As  can  be  seen  from  these  figures,  one  of  the  key  differences  between  this  form  of  the  ontology 
and  the  basic  form  concerns  the  number  of  individuals  and  triples  in  the  ontology.  The  reason  for 
the  increase  relates  to  the  representation  of  sentences,  semantic  annotations,  and  data  generated 
as  part  of  the  analysis.  In  general,  most  applications  of  an  ontology  will  see  an  increase  in  the 
number  of  individuals  and  triples  rather  than  any  change  in  the  actual  classes  and  properties  of 
the  ontology. 

Key  Concepts.  The  IEXTREME  ontology  consists  of  a  variety  of  classes,  each  of  which  can  be 
seen  to  represent  concepts  in  the  domain  of  religious  extremism.  The  following  sections  aim  to 
provide  a  flavor  of  the  kinds  of  concepts  in  the  ontology.  For  the  purposes  of  brevity,  we  focus 
on  just  three  of  the  high-level  concepts  in  the  ontology,  namely  actions/activities,  agents  and 
metacognitive  values. 

Actions  and  Activities .  The  ActionOrActivity  class  is  a  generic  class  that  represents  the  notion  of 
an  action  or  activity.  Within  the  context  of  the  IEXTREME  ontology,  it  serves  as  a  superclass  for 
a  number  of  other  classes;  for  example,  the  CognitiveActivity,  Conflict  Activity, 

ImmoralActivity,  PoliticalActivity,  ReligiousActivity  and  Sinful  Activity  classes.  Each  of  these 
subclasses  can  be  further  decomposed  into  other  classes.  For  example,  the  Sinful  Activity  class 
subsumes  a  number  of  actions  or  activities  denoting  sinful  activities.  These  include  Bidah, 
Blasphemy,  Idolatry,  and  Shirk. 

The  inclusion  of  classes  representing  various  types  of  action  or  activity  allows  us  to  record 
information  about  these  phenomena  in  source  materials.  For  example,  in  the  Web  page 
corresponding  to  the  Sinful  Activity  class,  we  see  three  sentences  in  which  the  notion  of  sin  is 
deemed  to  be  expressed  (see  Figure  9).  We  can  see  that  two  of  the  sentences  are  categorized 
as  JudgementAuthority  sentences,  and  the  other  one  is  categorized  as  an 
InformationExchangeSeparation  sentence.  Since  these  sentence  categorizations  correspond  to 
metacognitive  values,  we  can  begin  to  detect  the  differential  frequency  of  occurrence  of  specific 
concepts  across  different  types  of  sentences  corresponding  to  one  or  more  metacognitive  values. 
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In  essence,  this  enables  us  to  detect  the  conceptual  markers  of  specific  metacognitive  values  in 
source  sentences. 


IEXTREME  Ontology  1.0 

OWL  Ontology 


Sinful  Activity 


The  Sinful  Activity  dass  represents  sinful  activities. 


Facets 

Type:  AKTive8.  OWL.  Model.  OWLNamedClass 

Is  Deprecated:  False 

ID:  http:  //www .  edefence .  org/ontologies/iextreme.  owl  JfSinfulActivity 

Source  Sentences  (3) 

•  InformationExchangeSeparation:  evildoers  and  vocal  sin  of  all  kinds; 

•  JudgementAuthority:  Democracy  is  a  sin  of  rights; 

•  JudgementAuthority:  spread  of  sins  and  shirk  the  broadest; 

Super  Types 

•  (^ActionOr  Activity 

Sub  Types 

•  S)  3ida 

•  £>3.dah 

«  i£)  Blasphemy 

•  £)  Idolatry 

•  ic)  Shirk 

Equivalent  Classes 

•  None 


Figure  9.  Web  page  showing  information  about  the  Sinful  Activity  class. 

Agents.  The  Agent  class  represents  agents  within  the  IEXTREME  ontology.  An  agent  is  defined 
is  something  that  is  the  performer  of  a  specific  action  or  activity.  Thus,  we  can  think  of  agents 
as  things  that  do  things — that  cause  specific  things  to  happen.  As  seen  in  Figure  10,  which 
illustrates  a  part  of  the  taxonomic  hierarchy  associated  with  the  Agent  class,  the  Agent  class 
subsumes  a  large  number  of  subordinate  classes.  Most  of  these  classes  correspond  to  various 
ways  of  viewing  individual  agents;  for  example,  as  atheists,  infidels,  martyrs,  and  so  on. 
However,  the  class  hierarchy  also  incorporates  the  notion  of  spiritual  agents,  such  as  deities, 
and  collections  of  agents,  such  as  organizations,  groups,  and  societies. 
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Figure  10.  Taxonomic  hierarchy  associated  with  the  Agent  class. 

Metacognitive  Values.  Figure  1 1  provides  a  UML  projection  of  the  part  of  the  IEXTREME 
ontology  that  deals  with  the  representation  of  metacognitive  values  (for  the  purposes  of 
brevity,  some  of  the  metacognitive  value  instances  have  been  omitted).  As  can  be  seen 
from  the  figure,  four  types  of  metacognitive  value  are  represented  in  the  ontology.  These 
are  the  KnowledgeTypeMetaCognitiveValue,  JudgementTypeMetaCognitiveValue, 
CoherenceTypeMetaCognitiveValue  and  InformationExchangeTypeMetaCognitiveValue. 
Specific  metacognitive  values  are  represented  as  individuals  that  are  instantiated  from  each  of 
these  classes.  For  example,  as  in  Figure  11,  both  the  KnowledgeChangeMeta  CognitiveValue 
and  the  KnowledgeMaintenanceMetaCognitiveValue  represent  instances  of  the 
KnowledgeTypeMetaCognitiveYalue. 
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Figure  11.  Representation  of  metacognitive  values  within  the  EXTREME  ontology. 

One  way  in  which  metacognitive  values  are  used  in  the  IEXTREME  ontology  is  to  support 
the  classification  of  resources.  Figure  12,  for  example,  shows  how  metacognitive  values 
are  used  to  support  the  classification  of  a  particular  kind  of  resource,  namely  sentences. 

We  can  see  that  a  specific  sentence  is  linked  to  an  instance  of  the  SentenceClassification 
class  via  the  hasClassification  property.  This  instance  is  then  linked  to  a  metacognitive  value 
through  the  hasClassificationCategory  property.  The  result  is  that  we  can  represent  particular 
kinds  of  sentences  in  terms  of  their  linkage  to  particular  kinds  of  metacognitive  value.  A 
CoherenceConsistencySentence  class  can  be  defined,  for  example,  as  one  that  uses  logical 
expressions  involving  the  aforementioned  properties  in  order  to  make  it  clear  to  a  machine  what 
is  meant  by  the  notion  of  a  sentence  that  expresses  coherence  consistency  metacognitive  values. 


Figure  12.  Using  metacognitive  values  to  classify  resources. 
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Ontology  Viewer  Application.  In  order  to  support  the  visualization  and  evaluation  of  the 
IEXTREME  ontology,  we  developed  a  custom  application  called  the  Ontology  Viewer 
Application  (OVA;  see  Figure  13).  This  application  provides  a  variety  of  basic  visualization  and 
navigation  features.  For  example,  it  allows  the  IEXTREME  ontology  to  be  viewed  as  a  set  of 
inter-linked  web  pages.  It  also  enables  users  to  publish  the  IEXTREME  ontology  as  a  set  of  web 
pages  for  independent  viewing.  More  advanced  features  of  the  application  include  the  ability  to 
query  the  ontology  contents  using  the  semantic  web  query  language,  SPARQL  (DuCharme, 
2011),  the  ability  to  invoke  a  semantic  reasoner  to  reason  over  the  ontology,  and  the  ability  to 
create  rules  to  customize  the  reasoning  process. 
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<&>  IEXTREME  Ontology 

Project  Overview 

®7he  IEXTREME  project  is  a  trans-Atlantic  collaborative  project,  funded  by  the  U.S.  Office  of 
Naval  Research.  The  project  is  a  collaborative  venture  between  the  University  of 
Southampton,  Applied  Research  Associates  and  Rababy  &  Associates  LLC,  with  Applied 
Research  Associates  acting  as  the  prime  contractor.  The  main  goal  of  the  IEXTREME 
project  is  to  develop  a  better  understanding  of  the  ideological  enablers  associated  with  the 
behaviour  of  terrorist  and  insurgent  groups.  The  National  Military  Strategic  Plan  for  the 
War  on  Terrorism  identifies  extremist  ideology  as  the  enemy's  strategic  center  of  gravity, 
and  the  Department  of  Defense  (DoD)  plays  a  significant  role  in  establishing  an 
environment  unfavourable  to  extremist  ideas,  terrorist  recruitment,  and  support.  In  spite 
of  this,  however,  we  have,  as  yet,  little  understanding  of  the  specific  ways  in  which 
extremist  ideology  contributes  to  various  forms  of  terrorist  action.  IEXTREME  aims  to  address  this  shortcoming  by 
combining  state-of-the-art  approaches  to  cultural  modelling  with  a  variety  of  advanced  knowledge  technologies. 
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Figure  13.  The  IEXTREME  Ontology  Viewer  Application. 

The  functionality  of  the  OVA  includes  the  following: 

•  Visualization  and  Browsing:  The  main  function  of  the  OVA  is  to  support 
visualization  and  browsing  of  the  IEXTREME  ontology.  The  OVA  accomplishes  this 
function  by  providing  a  Web-based  visualization  interface  to  the  ontology.  The  Web- 
based  interface,  in  this  case,  is  simply  a  set  of  Web  pages  that  describes  each  of  the 
elements  in  the  ontology.  Figure  13  shows  an  example  of  a  Web  page  showing  a  specific 
ontology  element. 

•  Searching:  The  OVA  provides  a  search  form,  which  a  user  can  use  to  locate  specific 
ontology  elements  within  the  ontology. 
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•  Querying:  The  OVA  provides  capabilities  to  edit  and  execute  semantic  queries  defined 
using  the  SPARQL  query  language.9  Figure  14  further  illustrates  the  query  capabilities  of 
the  OVA. 

•  Rule  Editing:  The  OVA  provides  a  Rule  Editor  Tool,  which  enables  users  to  create  and 
edit  their  own  rules  for  the  ontology.  These  rules  can  be  used  to  support  custom  reasoning 
processes. 

•  Reasoning:  The  OVA  provides  a  Reasoning  Tool,  which  implements  a  reasoning 
capability  for  the  IEXTREME  Ontology.  The  query  capabilities  of  the  OVA  are  further 
illustrated. 

•  Publication:  The  OVA  enables  a  user  to  publish  the  IEXTREME  ontology  as  a  set  of 
HTML  web  pages. 

The  functionality  of  the  OVA  in  two  specific  areas,  namely  querying  and  reasoning,  is  described 
in  the  following  subsections. 

Retrieving  Information  Using  Semantic  Queries.  One  of  the  capabilities  provided  by  the  OVA  is 
the  ability  to  create,  edit  and  execute  semantic  queries  against  the  IEXTREME  ontology.  This 
can  be  extremely  useful  because  it  enables  us  to  ask  questions  about  the  structure  and  content  of 
the  ontology. 


Figure  14.  Semantic  query  tool. 

Query  capabilities  are  made  available  by  the  OVA  through  the  Semantic  Query  Tool 
(see  Figure  14).  This  form  comprises  a  number  of  interface  elements  described  in  Table  2. 


9http://www.  w3.org/TR/rdf-sparql-query/ 
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Table  2.  Functionality  of  the  Semantic  Query  Tool 


Number 

Name 

Description 

1 

Toolbar 

The  Toolbar  features  two  buttons  that  enable  the  user  to  save  the  contents  of 
the  Query  Repository  (which  consists  of  user-defined  semantic  queries)  and 
to  switch  the  current  view  of  the  Query  Repository.  Both  of  these  functions 
are  described  in  more  detail  below. 

2 

Query 

Repository 

The  Query  Repository  displays  all  the  semantic  queries  that  have  been 
created  by  a  user  of  the  application.  Each  query  is  represented  by  an  icon 
and  a  text  label — the  latter  corresponding  to  the  name  of  the  query.  When 
the  user  selects  a  query  from  the  Query  Repository,  the  text  of  the  query  is 
displayed  in  the  Query  Editor. 

3 

Query  Editor 

The  Query  Editor  shows  the  text  of  a  query  which  has  been  selected  from 
the  Query  Repository.  As  its  name  suggests,  the  Query  Editor  can  be  used 
to  edit  the  syntax  of  an  existing  query10 

4 

Execute 

Button 

The  Execute  Button  is  used  for  executing  queries  that  have  been  selected 
from  the  query  repository.  The  results  of  query  execution  are  shown  in  a 
separate  dialog  box.  If  the  query  cannot  be  interpreted  for  any  reason,  the 
application  will  throw  an  exception  and  an  error  dialog  will  be  displayed. 

A  query  can  be  created  using  the  Query  Editor  component  and  then  executed  by  clicking  the 
Execute  Button.  The  query  shown  in  Figure  14  is  a  relatively  simple  query;  it  simply  asks  what 
types  of  deities  have  been  asserted  in  the  ontology.  If  we  click  the  Execute  Button  while  this 
query  appears  in  the  Query  Editor,  then  we  see  the  results  shown  in  Figure  15.  These  results  tell 
us  that  there  are  three  types  of  deity  asserted  in  the  ontology:  Allah,  ChristianGod,  and 
JewishGod 


»J  3  query  results 


http://vwyAw.edefence.0rg/0nt0l0gies/iextreme.0wl#ChristianG0d 

http://www.edefence.0rg/0nt0l0gies/iextreme.0wl#JewishG0d 


Figure  15.  Query  results  for  deities  query. 


10Note  that  the  Semantic  Query  Tool  only  supports  SPARQL  queries;  other  types  of  query  language  are  not 
supported.  More  information  about  the  SPARQL  query  language  can  be  found  on  the  SPARQL  website 
(http://www.w3.org/TR/rdf-sparql-query/). 
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We  can  also  use  queries  to  pose  questions  that  are  more  complex.  Thus,  consider  the  following 
query: 

PREFIX  rdf:  <http://www.w3.Org/1999/02/22-rdf-syntax-ns#> 

PREFIX  rdfs:  <http://www.w3.Org/2000/01/rdf-schema#> 

PREFIX  xsd:  <http://www.w3.Org/2001/XMLSchema#> 

PREFIX  owl:  <http://www.w3.Org/2002/07/owl#> 

PREFIX  ix:  <http://www.edefence.0rg/0nt0l0gies/iextreme.0wl#> 

SELECT  (str(?text)  AS  ?result) 

WHERE 

{ 

?sentence  rdf:type  ix: Sentence  . 

?sentence  ix:hasSemanticAnnotation  ?anno  . 

?sentence  ix:hasText  ?text . 

{ 

{ 

?anno  ix:ontologyElement  ?element . 

?element  rdfs:subClassOf  ix:  Sinful  Activity  . 

} 

UNION 

{ 

?anno  ix:ontologyElement  ix:  Sinful  Activity  . 

} 

UNION 

{ 

?anno  ix:ontologyElement  ?element . 

?element  rdf:type  ix:  Sinful  Activity  . 

} 

} 


} 

This  query  is  clearly  much  more  complex  than  the  previous  one.  In  this  case,  it  is  asking  about 
the  sentences  that  mention  things  related  to  sinful  activities.  Essentially,  it  is  retrieving  the  text 
of  sentences  that  are  annotated  with  three  kinds  of  ontology  element: 

•  elements  that  are  subclasses  of  the  Sinful  Activity  class, 

•  elements  that  are  the  Sinful  Activity  class,  and 

•  elements  that  are  instances  of  the  Sinful  Activity  class. 

If  we  execute  this  query,  we  see  the  query  results  listed  in  Figure  16. 
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b-j  8  query  results 


|  [°] 


result 

>■ 

► 

marry  not  idolatrous  till  they  believe; 

they  made  Shirk,  for  which  He  had  sent  no  authority; 

Democracy  is  a  sin  of  rights; 

knows  the  extent  of  kufr  and  shirk; 

spread  of  sins  and  shirk  the  broadest; 

guilty  of  blasphemy; 

great  Shirk,  Greater  Shirk  is  when  a  person  associates  Partners  with  Allah,  while  Islam  says  that  Allah  has  n... 

* 

■w 

Jl 


Figure  16.  Query  results  for  sentences  expressing  sinful  activities  query. 

Next,  consider  the  query  illustrated  below: 

PREFIX  rdf:  <http://www.w3.Org/1999/02/22-rdf-syntax-ns#> 

PREFIX  rdfs:  <http://www.w3.Org/2000/01/rdf-schema#> 

PREFIX  xsd:  <http://www.w3.Org/2001/XMLSchema#> 

PREFIX  owl:  <http://www.w3.Org/2002/07/owl#> 

PREFIX  ix:  <http://www.edefence.0rg/0nt0l0gies/iextreme.0wl#> 

SELECT  DISTINCT  ?element 
WHERE 
{ 

?sentence  rdf:type  ix:Sentence  . 

?sentence  ix:hasClassification  ?sc  . 

?sc  rdf:type  ix:SentenceClassification . 

?sc  ix:hasClassificationCategory  ix:KnowledgeMaintenanceMetaCognitiveValue  . 
?sentence  ix:  has  Semantic  Annotation  ?anno  . 

?anno  ix:ontologyElement  ?element . 

} 

This  query  retrieves  all  the  ontology  elements  that  are  used  to  annotate  sentences  that  express 
knowledge  maintenance  metacognitive  values.  Queries  of  this  kind  tell  us  something  about  the 
kind  of  conceptualizations  used  in  different  kinds  of  sentences.  Figure  17  illustrates  the  results  of 
this  query. 
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o"-1  42  query  results 


element 

> 

► 

http://www.edefence.0rg/0ntologies/iextreme.0wl#lslarn 

http://www.edefence.0rg/0ntologies/iextreme.owl#ShariaLaw 

http://www.edefence.0rg/ontologies/iextreme.owl#PoliticalBeliefOrValue 

http://www.edefence.org/ontologies/iextreme  owl#ConflictActivity 

http://www.edefence.0rg/0ntologies/iextreme.0wl#Ummah 

http://www.edefence.0rg/ontologies/iextreme.owl#Cognition 

http://www.edefence.0rg/0nt0logies/iextreme.0wl#Religi0n 

' — 

http://www.edefence.0rg/ontologies/iextreme.owl#lnnovation 

http://www.edefence.0rg/0ntologies/iextreme.owl#EvilThing 

1 

1 

http://www.edefence.org/ontologies/iextreme  owl#Belief 

http://www.edefence.org/ontologies/iextreme  owl#Misguidance 

http://www.edefence.0rg/0nt0logies/iextreme.0wl#C0gnitiveActivity 

http://www.edefence.0rg/ontologies/iextreme.owl#KnowledgeOrUnderstanding 

http://www.edefence.Org/ontologies/iextreme.owl#Muslim 

http://www.edefence.org/ontologies/iextreme  owl#ReligiousWorship 

http://www.edefence.Org/ontologies/iextreme.owl#ThinkingOrReasoning 

http://www.edefence.0rg/ontologies/iextreme.owl#ReligiousText 

• 

Figure  1 7.  Query  results  for  ontology  elements  associated  with  Knowledge  Maintenance 
Sentences  Query. 

Finally,  consider  the  query  below: 

PREFIX  rdf:  <http://www.w3.Org/1999/02/22-rdf-syntax-ns#> 

PREFIX  rdfs:  <http://www.w3.Org/2000/01/rdf-schema#> 

PREFIX  xsd:  <http://www.w3.Org/2001/XMLSchema#> 

PREFIX  owl:  <http://www.w3.Org/2002/07/owl#> 

PREFIX  ix:  <http://www.edefence.Org/ontologies/iextreme.owl#> 

SELECT  DISTINCT  (str(?text)  AS  ?result) 

WHERE 

{ 

?anno  ix:ontologyElement  ix:KnowledgeOrUnderstanding  . 

?anno  a  ix:  Semantic  Annotation  . 

?anno  ix:selectedText  ?text . 

} 


This  query  asks  for  the  text  associated  with  a  specific  kind  of  semantic  annotation,  namely  one 
that  features  the  KnowledgeOrUnderstanding  class,  to  be  returned.  If  we  execute  this  query,  we 
see  the  query  results  listed  in  Figure  18. 
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"9  7  query  results 


result 

- 

► 

understands 

understanding 

wisdom 

knowledge 

= 

Wisdom 

know 

understand 

* 

- 

Figure  18.  Query  results  for  the  text  associated  with  semantic  annotations  using  the 
KnowledgeOrUnderstanding  class  query. 

Rule-Based  Reasoning.  In  addition  to  a  querying  capability,  the  OVA  also  provides  an  ability 
to  create,  edit  and  execute  rules.  These  rules  can  be  used  to  implement  sophisticated  forms  of 
inference  that  exploit  the  semantic  infrastructure  of  the  IEXTREME  ontology.  As  an  example 
of  this  inferential  capability  consider  a  rule  designed  to  classify  automatically  sentences  as 
instances  of  a  class  called  the  KnowledgeMaintenanceSentence  class.  The  syntax  of  this  rule 
is  presented  in  Figure  19. 


Figure  19.  Rule  Editor  Tool  showing  ‘classify-knowledge-maintenance-sentences’  rule. 
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This  rule  (and  indeed  all  rules  in  the  OVA)  is  implemented  as  a  CLIPS  rule  using  the  ‘defrule’ 
construct. 1 1  Each  rule  features  a  set  of  conditions  in  the  left-hand  side  rule,  which  are  matched 
against  the  triples  in  the  ontology.  If  the  conditions  of  the  rule  are  satisfied,  then  the  statements 
on  the  right-hand  side  or  ‘action’  part  of  the  rule  are  executed.  In  general,  the  action  part  of  the 
rule  contains  statements  that  assert  new  triples  into  the  ontology.  Thus,  the  rule  in  Figure  19 
executes  (or  ‘fires’)  when  the  following  conditions  are  met: 

•  there  is  a  triple  x  which  has  ‘http://www.w3.Org/2000/01/rdf-schema#subPropertyOf 
as  its  predicate 

•  the  subject  of  x  is  the  predicate  of  another  triple  y,  which  also  has  a  subject  a  and  an 
object  b 

•  there  is  no  triple  which  has  the  object  of  x  as  its  predicate  and  in  which  a  and  b  are  the 
subjects  of  objects,  respectively,  of  the  triple 

The  action  part  of  this  rule  asserts  a  triple,  which  has  a  predicate  corresponding  to  the  object  of  x, 
a  subject  corresponding  to  a  and  an  object  corresponding  to  b.  The  rule  essentially  adds  triples 
representing  the  fact  that  two  objects  linked  by  a  property  z  must  also  be  linked  by  any  properties 
that  are  super  properties  of  z.  For  example,  if  we  have  two  properties,  isParentOf  and  isFatherOf, 
and  isFatherOf  is  a  subproperty  of  isParentOf,  and  we  also  have  two  instances,  Bill  and  Peter, 
where  Bill  is  linked  to  Peter  via  the  isFatherOf  property,  then  a  reasoner  should  be  able  to  infer 
that  Bill  is  also  a  parent  of  Peter  and  assert  that  Bill  isParentOf  Peter.  When  asked  whether  Bill  is 
a  parent  of  Peter,  an  intelligent  machine  should  be  able  to  draw  on  the  semantics  of  the  ontology 
to  infer  that  Bill  is  indeed  a  parent  of  Peter  and,  therefore,  give  a  semantically-sensible  response 
to  the  question  posed  to  it. 

Once  a  rule  has  been  defined,  a  reasoner  can  use  it  to  perform  inferences  against  the  IEXTREME 
ontology.  This  is  accomplished  within  the  OVA  by  using  the  Reasoning  Tool  (see  Figure  20). 

The  Reasoning  Tool  allows  a  user  to  execute  the  rules  created  by  the  Rule  Editor  Tool  and  then  it 
adds  whatever  knowledge  statements  the  reasoner  inferred  to  the  IEXTREME  ontology.  This 
capability  can  enrich  the  ontology  in  various  ways.  For  example,  once  the  ‘classify-knowledge- 
maintenance-sentences’  rule  shown  in  Figure  19  is  executed  by  the  reasoner,  then  the  syntactic 
complexity  of  semantic  queries  involving  KnowledgeChangeSentences  can  be  simplified 
considerably.  Recall,  for  instance,  the  query  used  to  retrieve  the  ontology  elements  used  in 
semantic  annotations  of  sentences  expressing  knowledge  maintenance  metacognitive  values.  The 
original  version  of  the  query  was  as  follows: 


11  More  information  about  the  CLIPS  language  can  be  found  on  the  CLIPS  website  (http://clipsn.iles. 
sourceforge.net/). 


Approved  for  Public  Release  (Distribution  is  Unlimited) 


45 


Cognitive  Solutions  Division 
Applied  Research  Associates,  Inc. 


Final  Report 

Prime  Contract  No.:  N00014-10-C-0078 


PREFIX  rdf:  <http://www.w3.Org/1999/02/22-rdf-syntax-ns#> 

PREFIX  rdfs:  <http://www.w3.Org/2000/01/rdf-schema#> 

PREFIX  xsd:  <http://www.w3.Org/2001/XMLSchema#> 

PREFIX  owl:  <http://www.w3.Org/2002/07/owl#> 

PREFIX  ix:  <http://www.edefence.0rg/0nt0l0gies/iextreme.0wl#> 

SELECT  DISTINCT  ?element 
WHERE 

{ 

?sentence  rdf: type  ix: Sentence  . 

?sentence  ix:hasClassification  ?sc  . 

?sc  rdftype  ix:SentenceClassification  . 

?sc  ix:hasClassificationCategory  ix:KnowledgeMaintenanceMetaCognitiveValue 

?sentence  ix:hasSemanticAnnotation  ?anno  . 

?anno  ix:ontologyElement  ?element . 

} 


Figure  20.  The  OVA  Reasoning  Tool. 
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Now,  following  the  execution  of  the  ‘classify-knowledge-maintenance-sentences’  rule,  the 
syntax  of  this  query  can  be  simplified  to  the  following: 

Using  the  Ontology  to  Annotate  Source  Sentences.  As  mentioned  in  previous  sections,  the 
IEXTREME  Ontology  has  been  used  to  support  the  analysis  of  a  set  of  source  sentences  that 
were  used  as  part  of  the  derivation  of  metacognitive  values  at  an  earlier  stage  of  the  project. 
These  sentences  were  originally  taken  from  the  set  of  religious  (Islamic)  textual  resources  as 
described  in  Study  1 ,  and  they  each  reflect  the  expression  of  a  specific  kind  of  metacognitive 
value.  Tables  3  shows  some  examples  of  these  sentences. 

PREFIX  rdf:  <http://www.w3.Org/1999/02/22-rdf-syntax-ns#> 

PREFIX  rdfs:  <http://www.w3.Org/2000/01/rdf-schema#> 

PREFIX  xsd:  <http://www.w3.Org/2001/XMLSchema#> 

PREFIX  owl:  <http://www.w3.Org/2002/07/owl#> 

PREFIX  ix:  <http://www.edefence.0rg/0nt0l0gies/iextreme.0wl#> 

SELECT  DISTINCT  ?element 
WHERE 

{ 

?sentence  rdfitype  ix:KnowledgeMaintenanceSentence  . 

?sentence  ix:hasSemanticAnnotation  ?anno  . 

?anno  ix:ontologyElement  ?element . 

} 

Table  3.  Examples  of  Sentences  Expressing  Different  Kinds  of  Metacognitive  Value 


Metacognitive  Value 

Example  Sentences 

Coherence 
Consistency  (CC) 

To  accept  the  idea  of  pluralism  means  that  you  do  not  care  much  about  religion. 

The  ultimate  goal  of  Jihad  in  Islam  is  for  no  religion  to  remain  on  Earth  except  Islam. 

The  union  between  monotheism  and  polytheism  is  very  evil. 

Coherence  Diversity 
(CD) 

Just  as  men  have  invented  different  languages  to  talk  to  each  other,  so  they  have 
invented  different  religions  to  talk  to  God,  and  God  understands  them  all  well  enough. 

Information  Exchange 
Interaction  (IEI) 

By  facilitating  our  relations  with  Non-Muslims  then  we  reflect  a  brighter  picture  of 

Islam  and  Muslims. 

A  Muslim  is  allowed  to  marry  a  non-Muslim,  Christian  or  Jew,  and  should  give  her  the 
liberty  to  practice  her  religion  without  interfering. 

Information  Exchange 
Separation  (IES) 

Fight  those  who  believe  not  in  Allah  nor  the  Last  Day. 

Jihad  alone  that  liberates  the  Muslim  lands  from  the  grip  of  the  unbelievers. 

Judgment  Authority 
(JA) 

The  right  to  understand  and  explain  Islam  is  confined  to  Muslim  jurists. 

Judgment 
Independence  (JI) 

Islam  has  no  generally  accepted  clerical  hierarchy  or  bureaucratic  organization. 

Ijtihad  can  be  a  tool  for  understanding  Islamic  principles  in  a  way  that  fits  the  needs  and 
challenges  of  individuals  and  societies. 

Knowledge  Change 
(KC) 

Our  views  should  change  to  reflect  our  understanding  of  new  knowledge  or  experience 
or  different  circumstances. 

Seek  knowledge  by  even  going  to  China,  for  seeking  knowledge  is  incumbent  on  every 
Muslim. 

Knowledge 
Maintenance  (KM) 

Every  innovation  is  misguidance  and  every  misguidance  is  in  the  fire. 

He  who  learns  knowledge  for  other  than  God,  and  his  aim  be  other  than  God,  will  abide 
in  fire. 
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In  total,  there  were  467  sentences  grouped  into  1  of  8  categories  of  metacognitive  values.  In 
order  to  better  understand  how  the  concepts  in  the  IEXTREME  ontology  were  distributed  across 
these  sentences  (and  thus  to  better  understand  the  conceptual  markers  of  metacognitive  values), 
we  used  the  IEXTREME  Ontology  to  semantically  annotate  all  467  sentences. 

The  results  of  the  analysis  show  how  many  times  various  elements  in  the  ontology  appear  in  the 
different  types  of  sentences.  For  example,  the  notion  of  Allah  has  the  following  distribution 
pattern  across  the  sentences: 

•  Coherence  Consistency  Sentences:  5 

•  Coherence  Diversity  Sentences:  0 

•  Information  Exchange  Interaction  Sentences:  4 

•  Information  Exchange  Separation  Sentences:  4 

•  Judgment  Authority  Sentences:  8 

•  Judgment  Independence  Sentences:  5 

•  Knowledge  Change  Sentences:  2 

•  Knowledge  Maintenance  Sentences:  1 

From  this  distribution  pattern,  we  can  see  that  the  notion  of  Allah  does  not  tend  to  discriminate 
between  the  various  kinds  of  sentences:  it  has  a  rather  uniform  distribution  across  the  sentence 
categories,  and  thus  its  power  as  a  discriminative  feature  for  sentence  classification  is  pretty 
poor.  In  contrast  to  this,  consider  the  notion  of  a  body  or  system  of  knowledge,  which  is 
represented  in  the  IEXTREME  ontology  via  the  BodyOfKnowledge  class.  This  concept  has 
the  following  sentence  distribution  profile: 

•  Coherence  Consistency  Sentences:  0 

•  Coherence  Diversity  Sentences:  0 

•  Information  Exchange  Interaction  Sentences:  0 

•  Infonnation  Exchange  Separation  Sentences:  0 

•  Judgment  Authority  Sentences:  0 

•  Judgment  Independence  Sentences:  0 

•  Knowledge  Change  Sentences:  36 

•  Knowledge  Maintenance  Sentences:  22 

Here  we  see  that  the  distribution  profile  of  the  BodyOfKnowledge  concept  is  not  uniform  across 
the  sentence  categories;  instead,  it  only  appears  in  the  Knowledge  Change  and  Knowledge 
Maintenance  sentences.  As  a  result,  we  can  say  that  the  BodyOfKnowledge  concept  has  the 
potential  to  discriminate  knowledge  change  and  knowledge  maintenance  metacognitive  values. 
In  other  words,  it  may  serve  as  a  conceptual  marker  for  these  types  of  metacognitive  values  in 
novel  sentences. 

In  general,  the  results  of  sentence-based  analysis  indicate  that  a  number  of  ontology  elements 
may  serve  as  important  discriminative  features  for  the  identification  of  specific  metacognitive 
values.  Table  4  lists  these  elements,  which  shows  how  a  number  of  ontology  elements  may  help 
to  discriminate  particular  types  of  metacognitive  value. 
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Table  4.  Table  Showing  Discriminative  Potential  of  a  Subset  of  Ontology  Elements 


Ontology  Element 

CC 

CD 

IEI 

IES 

JA 

JI 

KC 

KM 

BodyOfKnowledge 

Q 

CriticalThinkingOrReasoning 

* 

Fighting 

9 

FreedomOfChoice 

* 

FreedomO  fExpression 

a 

Ijtihad 

* 

<? 

Innovation 

* 

* 

KnowledgeAcquisitionActivity 

Q 

KnowledgeOrUnderstanding 

Q 

* 

Learning 

# 

LogicalThinking 

Marriage 

Misguidance 

* 

NewThing 

* 

* 

NonMuslim 

* 

Peace 

* 

Pilgrimage 

* 

RationalThinking 

• 

Relationship 

* 

ReligiousScholar 

SacredLaw 

* 

* 

ScientificBodyOfKnowledge 

* 

Sect 

* 

Shirk 

* 

ThinkingOrReasoning 

ToleranceAttitude 

* 

Ummah 

* 

UsefulKnowledge 

# 

UselessKnowledge 

• 

ValuedThing 

# 
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On  the  basis  of  this  analysis,  it  seems  that  Coherence  Consistency  and  Coherence  Diversity 
values  may  be  the  most  difficult  to  identify  using  ontology-based  forms  of  analysis.  This  section 
has  provided  an  overview  of  the  IEXTREME  Ontology  developed  to  support  OBIE  in  the 
context  of  Study  2.  The  structure  and  content  of  the  ontology  has  been  presented,  and  a  specific 
use  of  the  ontology  to  annotate  sentences  from  religious  texts  has  been  described.  Furthermore, 
an  application,  which  was  developed  to  support  the  browsing  and  visualization  of  the 
IEXTREME  ontology,  has  been  described,  and  some  of  the  more  advanced  features  of  this 
application  (most  notably  its  querying  and  reasoning  capabilities)  have  been  discussed.  In  the 
next  section,  we  describe  the  specific  Study  2  methods  used  to  incorporate  the  IEXTREME 
ontology  as  a  component  of  the  overall  OBIE  analysis  approach  to  extract  and  measure 
metacognitive  values  in  the  source  documents. 

Method 

In  this  section,  we  describe  the  text  analysis  methodology  that  was  used  to  process  religious  text 
data  in  order  to  measure  the  extreme  and  moderate  sentiments  referred  to  the  dimensions 
introduced  in  the  previous  section.  The  main  idea  of  the  proposed  approach  is  to  measure  what 
we  called  the  “polarity”  of  the  input  text  fragments  based  on  the  semantic  comparison  of  triples 
extracted  from  the  input  documents.  We  will  give  much  more  detail  on  what  we  mean  for 
“polarity”  and  “triples”  in  the  next  sections.  This  approach  requires  that  part  of  the  original 
dataset  is  used  to  train  our  model.  The  method  follows  the  process  flow  illustrated  in  Figure  21. 
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Figure  21.  Process  flow  of  our  approach. 
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Step  1:  Pre-Processing.  In  this  step,  the  input  document  was  split  into  a  set  of  sentences.  The 
sentence  splitter  is  based  on  the  OpenNLP  system. 12  For  each  sentence,  a  feature  vector  was  built 
and  it  was  used  as  input  into  a  classifier  in  order  to  determine  if  the  sentence  was  relevant  to  the 
analysis.  The  feature  vector,  in  this  case,  was  computed  using  the  Bag  of  Words  model,  and  the 
method  used  to  assign  the  values  of  the  vector  was  a  frequency-based  one  (Manning,  Raghavan, 
&  Schtze,  2008).  In  particular,  we  used  the  tf(t,s)*idf(t,s)  value,  where  tf(t,s)=f(s,t)\|N|  is  the 
frequency  of  the  tenn  t  in  the  sentence  s,  |N|  is  the  cardinality  of  our  vocabulary  and 
idf(t,s)=log(|S|\f(t,d)  the  inverse  document  frequency,  yielding  a  measure  of  the  rate  between 
the  number  of  selected  sentences  (|S|)  and  the  frequency  of  sentences  in  which  the  term  t 
appears  (f(t,d).  The  classifier  corresponded  to  a  Decision  Tree  based  on  the  C4.5  algorithm 
(Quinlan,  1993). 

Step  2:  NLP  Processing.  In  this  step,  each  sentence  was  processed  in  order  to  extract  metadata 
that  was  used  in  subsequent  steps.  In  particular,  we  implemented  a  classical  tokenization 
procedure,  and  for  each  token,  we  computed  Part  of  Speech  (POS)  tags.  After  this  stage,  we 
eliminated  stop  word  tokens  that  appeared  on  a  predefined  Stop  Word  lists.  We  built  a  stop 
words  list  based  on  the  ones  made  in  Stanford  Library. 13  POS  tags  were  computed  using  the 
Stanford  NLP  tagger,  made  available  from  the  same  website.  We  also  computed  the  lemma  of 
each  token  based  on  the  morphological  analysis  offered  by  Stanford  Library.  Token 
lemmatization  enabled  us  to  process  tokens  in  a  similar  fashion  irrespective  of  their  plural, 
comparative,  and  superlative  forms. 

Step  2:  Triple  Extraction.  In  this  step,  we  extracted  a  set  of  triples  from  each  sentence.  This 
phase  is  used  to  compute  what  we  defined  as  “polarity”  of  triples.  The  polarity  of  a  triple  is 
the  numeric  measure  used  to  weight  the  triple  respect  to  the  metacognitive  values  described  in 
Study  1. 

The  polarity  of  a  triple  gives  information  of  the  distribution  of  the  cultural  values  within  the  text 
from  which  the  triple  is  extracted.  A  triple  is  defined  as  (S;V;0)  structure,  where  S  is  the  set  of 
tokens  that  plays  the  role  of  subject,  V  is  the  set  of  tokens  that  plays  the  role  of  verb,  and  O  is  the 
set  of  tokens  that  plays  the  role  of  objects.  The  triples  are  extracted  using  the  following  pattern: 

(N|A)A*  -  (V|P)A*  -  (N|A)A* 

where  N  is  a  proper  noun,  a  noun,  a  personal  pronoun  or  foreign  word;  A  is  an  adjective;  V 
is  a  verb  that  can  be  in  past,  past  particle,  base  or  gerund  forms;  and  P  is  an  adverb,  particle, 
preposition  or  conjunction.  Here,  the  pattern  is  expressed  following  the  regular  expression 
syntax,  in  fact  the  symbol  (-)A*  is  used  to  indicate  that  the  related  sub  patterns  can  be  matched 
zero  or  multiple  times  and  (|)  is  represents  an  ‘or’  condition.  For  each  sentence,  we  can  extract 
different  triples  by  using  the  position  of  the  next  token  tagged  as  V.  For  the  sake  of  clarity,  let  us 
consider  how  we  extract  two  successive  triples.  Let  us  suppose  that  we  have  found  a  set  of 
tokens  that  matches  the  pattern  described  above  and  then  we  meet  a  token  classified  as  V.  In  that 
case,  we  generate  a  new  triple  where  the  tokens  that  belong  to  the  previous  O  fill  the  new  set  S. 
This  simple  rule  has  some  exceptions,  which  are  as  follows: 


12 

‘  http://incubator.apache.org/opennlp/ 

13  http://nlp.stanford.edu/software/index.shtml 
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•  if  the  closest  token  to  the  one  classified  as  V  is  tagged  as  a  personal  pronoun  (‘he’,  ‘they’, 
etc.)  or  existential  there  (i.e.,  ‘there’),  then  S  is  an  empty  set 

•  if  the  closest  token  to  the  one  classified  as  V  is  tagged  as  a  pronoun  (e.g.,  ‘where’,  ‘what’ 
or  ‘who’}  then  S=(U.S.  Air  Force)  where  t  is  the  closest  token  to  the  verb  classified  as  a 
proper  noun  or  noun 

•  if  the  two  closest  tokens  to  the  one  classified  as  V  are  tagged  as  a  proper  noun  or  noun 
and  they  are  separated  by  a  conjunction  (i.e.,  ‘and’)  then  S={t},  with  t  being  the  closest 
token  between  the  two 

We  also  take  into  account  negation;  in  particular,  we  associate  the  negation  with  the  closest 
token  classified  as  N,  V,  or  A.  We  note  that  this  extraction  method  can  be  problematic  in  the 
case  of  passive  forms  or  in  cases  where  there  are  co-references  between  different  tokens  in  the 
sentences,  known  as  anaphoric  references.  Most  of  the  solutions  proposed  in  the  NLP 
community  to  solve  these  problems  are  related  to  the  generation  of  a  parse  tree  and  the  analysis 
of  linguistic  dependencies  among  tokens.  In  our  case,  we  have  found  that  these  approaches 
increase  computation  time  without  delivering  significant  performance  gains.  In  fact,  in  our  case, 
the  text  is  obtained  from  a  machine  translation  stage  and  this  can  cause  problems  for  parse  tree 
generation.  This  is  the  reason  for  adopting  an  approach  based  on  rules  defined  over  the  POS  tags. 
Rules  on  POS  tags,  in  our  case  proved  to  be  more  reliable  than  rules  on  parse  tree  structure  built 
on  the  top  of  machine  translated  documents. 

For  example,  in  Table  5  we  have  sentences  from  the  “Knowledge”  cultural  dimension,  with  its 
values  “Maintenance”  and  “Change.”  In  the  same  Table  we  also  show  some  of  the  triples 
obtained  from  these  sentences,  we  have  bolded  the  subject  and  the  object  of  the  triples,  and  with 
we  indicate  the  missing  element  in  the  triple. 

Table  5.  Sample  Sentences  and  Associated  Triples  for  the  ‘Knowledge’  Cultural  Value 
Dimension 


Knowledge 

Maintenance 

Every  Innovation  is  misguidance  and  every  misguidance  is  in  the  fire. 

Extracted  Triples 

(innovation;  be;  misguidance), 

(misguidance;  be;  fire) 

Maintenance 

America  comes  to  change  the  fundamentals,  of  the  nation  changing  the  curriculum  and 
blocking  the  road  to  awakening 

Extracted  Triples 

(america;  come, 

change;  fundamental,  nation) 

(fundamental,  nation;  change;  curriculum) 

(curriculum;  block,  road) 

(road;  awake;  -) 

Change 

Religion  becomes  a  dynamic  and  evolving  concept  addressing  our  problem  and  challenges. 

Extracted  Triples 

(religion;  become;  dynamic,  evolving  concept) 

(dynamic,  evolving  concept;  address,  problem  challenge) 

Change 

There  are  hundreds  of  Prophetic  traditions  that  encourage  Muslims  to  acquire  all  types  of 
knowledge  from  any  corner  of  the  world. 

Extracted  Triples 

(-,be,  hundred  prophetic,  traditions) 

(hundred  prophetic,  traditions,  encourage,  Muslim) 

(Muslim;  acquire  all,  types  knowledge  corner  world) 
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Step  3:  Semantic  Processing.  In  this  step,  we  translate  the  triples  obtained  from  the  previous 
step  into  a  set  of  triples  comprised  of  concepts  retrieved  from  a  knowledge  base.  In  particular, 
we  use  the  lexical  resource  WordNet  as  a  knowledge  base  (Fellbaum,  1998).  In  the  literature, 
WordNet  is  a  well-known  resource  that  focuses  on  the  semantic  connections  between  words 
using  linguistic  criteria.  As  the  most  comprehensive  semantic  resource  for  the  English  language, 
we  chose  to  adopt  it  in  order  to  enrich  the  explicit  semantic  representation  contained  in  the 
source  material. 

WordNet  encodes  concepts  in  terms  of  sets  of  synonyms  (called  synsets).  WordNet  (version  3.0) 
contains  about  155,000  words  organized  into  over  1 17,000  synsets.  We  associate  a  synset  to  each 
token  using  the  function  (phi)  defined  as  follows: 

0  :  t  ->  argmaxsyeSYtscore{sy,  Q) 

where  t  is  the  input  token,  SYt  the  set  of  synsets  associated  to  the  token  t  in  WordNet,  Ct  is  S 
or  V  or  O  if  t  belongs  to  the  subject,  verb  or  object  part  of  the  triple  respectively.  The  score  is 
computed  as  follows: 


Score (5),,  Ct)  =  y  \Ct  n  gloss (sy) \ 

^—lSy-.9(Sy,R) 

where  gloss(Sy)  is  the  set  of  words  that  is  used  to  define  the  synset  s*  in  WordNet,  d(sy,  R )  is  a 
function  that  returns  a  set  made  up  of  sy  plus  all  the  sets  of  sysnsets  that  can  be  obtained  by  sy 
using  the  relations  in  (R).  This  is  one  of  the  scores  that  has  been  used  to  cope  with  the  word 
disambiguation  problem  (Navigli,  2009).  It  is  based  on  the  computation  of  the  intersection 
between  the  context  in  which  the  word  appears  and  the  context  in  which  the  synset  is  defined.  In 
our  case,  we  compute  the  context  of  the  synset  using  its  gloss  and  the  gloss  of  its  semantic 
neighbors.  By  “semantic  neighbors,”  we  mean  the  synsets  obtained  by  applying  different 
relations  in  Wordnet  such  as  hyponymy,  troponymy,  hypemymy,  meronymy,  holonymy  and 
entailment.  We  note  that  these  relations  are  selected  in  line  with  the  POS  associated  to  the  token. 
For  example,  we  derived  and  expanded  the  synsets  for  “concept”  that  is  classified  as  noun  by 
looking  at  the  closest  synsets  in  WordNet  that  are  also  nouns. 

After  this  automated  processing  step,  our  data  has  been  reduced  to  sets  triples  whose  constituent 
elements  (i.e.,  subject,  verb,  and  object)  correspond  to  synsets  in  WordNet. 

Step  4:  Orientation.  In  this  last  step,  we  computed  the  orientation  of  each  triple  with  respect  to 
the  cultural  values.  The  orientation  of  a  sentence  s  with  respect  to  a  dimension  D  is  defined  as 
follows: 


0(s,D)—  |P|  Str'gu  Str{tra,  tTj  )  |p|  Str* EV  Str (^rs >  ^ri  ) 

where  trs  is  the  set  of  triples  extracted  from  s,  D  is  the  set  of  triples  used  as  a  reference  for  the 
cultural  values  set  D  and  D  the  set  made  by  the  triples  that  are  not  in  D.  The  similarity  str 
between  triples  is  defined  as  follows: 
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str{tri,trj)—  |_4|i|B|  'HSyieA,ayjeB  s»y(sy^  aVi)  + 

|Vii*|Vj|  5Zsyfcevi,sa,.ev:)  s*»(syfc- 3Vr) 

where  A  is  the  set  of  synsets  derived  from  the  union  of  S  and  O  belonging  to  tr \\  B  is  the  set  of 
synsets  derived  by  the  union  of  S  and  O  belonging  to  tr;-;  and  VL  and  VL  are  the  set  of  synsets 
associated  with  the  verb  of  tp  and  try,  respectively.  The  similarity  function  between  two  synsets 
Ssy  has  been  previously  validated  in  the  computational  linguistics  literature  (Leacock,  Miller,  & 
Chodorow,  1998).  If  we  find  a  high  similarity  between  two  synsets  and  one  of  them  is  associated 
with  a  negation  tag,  then  the  similarity  is  defined  as  0. 

Analysis  and  Results 

A  total  of  234  documents  from  the  collection  described  previously  were  used  in  the  analysis. 
Recall  that  most  of  the  documents  relate  to  religious  sennons  posted  on  religious  sites  or  in 
blogs.  In  Table  6  we  have  the  distribution  of  unique  words  used  in  the  analysis  collection. 
Documents  in  Arabic  were  first  subjected  to  machine  translation  and  then  corrected  by  a  native 
Arabic  speaker,  as  described  previously  in  Study  1 . 

Table  6.  Statistics  of  Word  Distribution  in  our  Corpus 


Files 

Total  Words 

Average  Number  of  words  for  file 

Arabic  Documents 

71 

12406 

637 

English  Document 

163 

222976 

664 

In  order  to  deal  with  the  uncertainty  of  the  initial  classifier  used  to  discriminate  relevant  from 
irrelevant  sentences,  we  additionally  selected  150  sentences  from  a  web  site  that  provides  free 
access  to  religious  texts. 14  In  particular,  we  selected  the  sentences  that  have  the  highest  jaccard 
coefficient  with  the  ones  collected  in  our  dataset. 15  These  sentences  served  as  background  noise 
for  the  sentences  that  were  analyzed.  This  step  was  taken  because  we  need  to  evaluate  our 
methodology  in  the  presence  of  some  noise,  and  the  150  sentences  selected  from  the 
aforementioned  website  are  sentences  about  religion  that  do  not  express  any  content  related  to 
cultural  values. 

For  our  experiment,  we  use  half  of  our  collection  as  a  training  set.  This  means  that  we  use  half 
sentences  of  our  collections  to  derive  a  set  of  triples  that  are  used  in  the  computation  of  the 
orientation  formula.  In  other  words,  the  unseen  triple  is  compared  against  half  of  our  collection. 
We  repeated  this  cross-validation  process  10  times,  by  randomly  dividing  the  collection  in  two 
parts.  Figures  22-29  report  the  average  results. 

In  those  plots,  the  triples  are  on  the  X  axis  and  the  average  orientation  (sentiment)  computed  in 
the  10  experiments  on  the  Y  axis  (range  of  orientation  measure  is  [-1,1]).  We  observe  that  if  we 
adopt  a  strict  division  into  positive  and  negative  values  we  reach  reasonable  levels  of 


14  http://www.sacred-texts.com/download.htm9 

15  http://en.wikipedia.org/wiki/Jaccard_index 
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performance  for  several  of  the  dimensions.  Results  for  each  of  the  four  dimensions  are 
described,  in  turn. 

The  orientations,  or  sentiment  values,  for  the  Coherence  dimension  were  set  so  that  homogeny 
had  a  positive  orientation  and  diversity  a  negative  orientation.  As  seen  in  Figures  22  and  23,  the 
orientation  values  correspond  reasonably  well  with  the  human  annotation.  If  a  cutoff  for  value 
classification  is  set  at  0  (represented  by  +/-  in  the  figures),  then  the  average  percentage 
agreement  with  the  human  raters  is  79%,  which  is  good  in  the  context  of  automated  sentiment 
analysis. 


+  +  +  x  + 


+  +  +  +  . 
+  +  +  +  +  + 
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Figure  22.  Coherence:  Homogeny. 


Figure  23.  Coherence:  Diversity. 
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The  sentiment  orientations  for  the  knowledge  dimension  were  set  so  that  maintenance  had  a 
positive  orientation  and  change  had  a  negative  orientation.  As  seen  in  Figures  24  and  25,  the 
orientation  values  reveal  a  bias  towards  the  change  end  of  spectrum.  Even  so,  the  simple  0 
threshold  for  value  classification  (represented  by  +/-  in  the  figures)  produces  above  chance 
reliability  performance.  If  we  correct  for  bias  by  setting  the  threshold  to  -.2,  then  the  model 
achieves  67%  average  percentage  agreement  with  human  raters,  which  is  reasonable 
perfonnance. 


KM  Results 
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Figure  24.  Knowledge  maintenance. 


Figure  25.  Knowledge  change. 

The  sentiment  orientations  for  the  information  exchange  dimension  were  set  so  that  separation 

had  a  positive  orientation  and  interaction  a  negative  orientation.  As  seen  in  Figures  26 

and  27,  the  orientation  values  reveal  a  bias  towards  the  separation  end  of  spectrum.  Even  so,  the 
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simple  0  threshold  for  value  classification  (represented  by  +/-  in  the  figures)  produces  above 
chance  reliability  performance.  If  we  correct  for  bias  by  setting  the  threshold  to  .4,  then  the 
model  achieves  61%  average  percentage  agreement  with  human  raters,  a  moderate  level  of 
reliability  for  automated  sentiment  analysis. 
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Figure  26.  Information  exchange:  Separation. 
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Figure  27.  Information  exchange:  Interaction. 

The  sentiment  orientations  for  the  judgment  dimension  were  set  so  that  authority  had  a  positive 
orientation  and  independence  a  negative  orientation.  Figures  28  and  29  illustrate  similar  levels 
of  performance  as  the  infonnation  exchange  dimension,  though  without  indication  of  bias.  The 
0  threshold  for  value  classification  (represented  by  +/-  in  the  figures)  achieves  60%  average 
percentage  agreement  with  human  raters. 
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JA  Sentences 


Figure  28.  Judgment  authority. 


Figure  29.  Judgment  independence. 

In  sum,  best  sentiment  classification  performance  was  found  for  the  Coherence  and  Knowledge 
dimensions,  and  the  results  for  Information  Sharing  and  Judgment  dimensions  were  moderate. 
Most  sentiment  classification  analyses  have  focused  on  very  tangible  objects  for  determination  of 
attitudinal  orientation  (e.g.,  movies,  food  products,  and  the  like).  Until  now,  there  has  been  little 
attempt  to  measure  sentiments  associated  with  intangible  beliefs,  such  as  the  metacognitive 
beliefs  explored  here,  and  hence  the  presents  results  can  be  seen  to  set  a  new  standard  in  the 
field.  With  respect  to  the  substantive  issue  of  measuring  metacognitive  values,  the  results 
correspond  well  with  the  findings  of  Study  1 .  These  results  provide  additional  confirmation  for 
the  conclusion  that  moderate  and  extremist  ideologies  can  be  discriminated  on  the  basis  of  the 
metacognitive  values  that  are  embedded  in  their  religious  texts. 
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DISCUSSION 
Methodological  Implications 

To  the  best  of  our  knowledge,  our  approach  is  the  first  attempt  to  propose  a  computational 
method  to  measure  extremist  cultural  values  from  web-based  sources.  However,  previous 
research  has  addressed  somewhat  related  issues.  For  example,  an  interesting  survey  of  cultural 
influence  in  online-social  question/answer  behavior  has  been  presented  in  the  computer  science 
literature,  yet  the  study  did  not  attempt  computational  methods  to  analyze  their  text  data  (Yang, 
Morris,  Teevan,  Adamic,  &  Ackerman,  2011).  There  have  also  been  several  recent  additions  to 
the  literature  on  the  extraction  of  sentiment  information  from  Web-based  sources  (Lin,  Xing,  & 
Hauptmann,  2008).  However,  most  of  the  techniques  in  the  information  extraction  literature 
focus  their  attention  on  extraction  algorithms  and  ignore  the  specification  and  selection  of 
features  that  can  support  extraction  goals.  Most  of  the  existing  approaches  are  statistical  based 
such  as  the  one  based  on  topic  models  (Blei,  Ng,  &  Jordan,  2003).  The  topic  model  approach  is 
based  on  the  idea  that  a  document  can  be  represented  as  a  mixture  of  topic  distributions,  a  topic 
being  a  statistical  distribution  over  the  words  belonging  to  the  vocabulary  of  a  considered  corpus. 
Very  few  relevant  applications  exist,  including  a  topic  model  to  discriminate  different  ideologies 
in  text  (Lin  et  ah,  2008),  as  well  as  proposed  models  to  understand  political  ideals  (Yano,  Cohen, 
&  Smith,  2009).  In  these  cases,  however,  the  goal  was  simply  to  distinguish  between  main 
political  groups,  as  opposed  to  the  current  objective  of  measurement  of  degree  of  sentiment  to 
particular  ideals  within  groups.  Ontological  approaches,  such  as  we  have  adopted  here,  have 
improved  in  their  abilities  with  respect  to  more  general  text  classification  problems.  For 
example,  some  research  has  been  conducted  on  the  improvements  that  may  be  obtained  by  using 
semantic  features,  and  work  to  enrich  the  tenn  vectors  with  concepts  from  the  core  ontology 
(Hotho,  Staab,  &  Stumme,  2003).  These  researchers  show  how  the  procedure  can  improve  the 
classification  performance  by  solving  the  synonym  problem  and  the  identification  of  more 
general  topics.  In  particular,  they  suggest  three  strategies  where  they  choose  to  consider  just 
concepts,  terms,  or  combination  of  them.  They  show  that  with  data  sets  like  Reuters-21578, 
using  a  clustering  algorithm  like  Bi-Section  KMeans,  they  can  have  an  improvement  on  the 
baseline  of  8.4%  by  adding  concepts  to  the  terms  vector  and  by  using  word  sense  disambiguation 
procedures  and  a  hierarchy  structure  among  concepts.  An  enhancement  of  the  classical  document 
representation  through  concepts  extracted  from  background  knowledge,  and  they  use  a  boosting 
approach  as  classifier  (Bloehdom  &  Hotho,  2004).  They  show  on  Reuters-21578  collection  a 
gain  on  the  FI -Measure  of  3.29%.  In  addition,  some  researchers  have  introduced  the  concepts  of 
semantic  kernels,  their  results  indicate  a  consistent  improvement  on  the  FI -Measure  values 
(Bloehdorn,  Basili,  Cammisa,  &  Moschitti,  2006;  Wang  &  Domeniconi,  2008).  The  approach  we 
have  adopted  naturally  follows  from  these  earlier  efforts,  extending  the  general  ideas  to  cope 
with  sentiments  associated  with  abstract  beliefs. 

The  current  advances  in  automated  sentiment  analysis  have  been  essential  for  applications  to 
cultural  modeling.  Recent  research  in  cultural  modeling  techniques  has  emphasized  new  ways  of 
representing  cultural  knowledge  (Sieck  et  ah,  2010a).  These  representation  formats  have  further 
led  to  novel  developments  in  areas  of  semi-structured  and  structured  elicitation  methods  for 
direct  human  data  collection  (Sieck  et  ah,  2010c),  as  well  as  in  simulating  influences  of 
information  on  culturally-shared  beliefs  (Sieck  et  ah,  2011).  A  computational  method  for 
measuring  cultural  values  in  web-based  resources  adds  another  significant  component  to  the 
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cultural  analysts’  toolkit.  By  providing  a  means  to  extract  and  quantify  the  cultural  values 
embedded  in  large  and  increasing  volumes  of  text  being  generated  on  the  web,  the  present  work 
moves  a  step  closer  to  the  realization  of  a  “social  radar”  for  monitoring  and  modeling  changes  in 
the  sentiments  of  citizens  and  leaders  (Maybury,  2010). 

Substantive  Implications 

The  successful  exploitation  of  religious  texts  is  often  a  key  component  of  developing  certainty 
among  supporters  of  terrorist  agendas.  Yet,  the  specific  kinds  of  religious  ideas  that  promote 
such  certitude  lack  systematic  evaluation.  That  is,  how  do  specific  religious  ideas  eliminate 
doubt  in  the  minds  of  religious  extremists  and  their  supporters?  Our  primary  hypothesis  is  that 
extremist  interpretations  of  religious  doctrine  include  specific  “metacognitive”  beliefs  that  serve 
to  erase  doubt  in  the  group’s  cause  and  provide  psychological  defenses  against  contrary  views. 
Metacognitive  beliefs  are  specific  kinds  of  beliefs  that  affect  the  cognitive  processes  that  govern 
feelings  of  confidence  in  worldviews.  The  excessive  levels  of  confidence  that  ultimately  result 
from  certain  types  of  metacognitive  beliefs  serve  to  promote  decisive  action.  We  expect  that 
these  kinds  of  beliefs  are  manifested  in  one  form  or  another  in  the  ideologies  of  any  religious 
extremist  organization,  and  possibly  secular  ones  as  well.  In  the  current  study,  we  examined 
Islam  as  a  test  case.  Specifically,  we  compared  various  Muslim  beliefs  and  doctrine  as  expressed 
on  extremist  and  mainstream  Islamic  web  sites.  The  findings  of  Studies  1  and  2  provide 
encouraging  support  for  the  metacognitive  approach. 

As  described  in  the  introduction,  the  overall  goal  of  the  present  effort  was  to  understand 
extremist  ideological  influences  underlying  terrorist  and  insurgent  behavior,  in  a  way  that 
supports  the  future  development  of  predictive  models  of  adversary  decision  making.  The 
National  Military  Strategic  Plan  for  the  War  on  Terrorism  identifies  extremist  ideology  as  the 
enemy’s  strategic  center  of  gravity,  and  DOD  plays  a  significant  role  in  establishing  an 
environment  unfavorable  to  extremist  ideas,  recruiting,  and  support  (Wald,  2006).  Yet,  the 
specific  ideological  characteristics  that  serve  as  enablers  for  extreme  action  have  not  been  well 
understood.  The  results  of  this  project  provide  invaluable  input  to  the  development  of  accurate 
models  of  adversary  decision  making,  as  well  as  for  the  cognitive  characterization  of  Islamic 
groups  based  on  their  ideological  commitments  in  a  way  that  directly  supports  information 
operations  and  strategic  communications.  A  critical  aspect  of  establishing  an  environment 
unfavorable  to  extremist  ideas  is  to  begin  to  take  apart  the  rhetoric  of  terror  sponsoring 
organizations,  and  address  their  ideologies  through  communication  (Speckhard,  2006).  In  doing 
this,  we  may  remove  the  appeal  of  religious-inspired  myths  of  terrorist  acts  as  the  glorious 
correction  of  moral  wrongdoing  (Sageman,  2008).  To  make  this  approach  work,  we  must  first 
unpack  the  ideologies  themselves,  specifically,  the  extremist  ideas  that  promote  moral  certainty 
within  the  terrorist  mind.  The  current  studies  successfully  demonstrate  an  approach,  including 
both  a  model  of  cultural  values  embedded  in  such  ideologies,  and  computational  methods  for 
extracting  those  values  from  web  sources. 
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Abstract 

Terrorists  attempt  to  communicate  specific  aspects  of  their  ideological  frameworks  to  shape  the 
common  perspective  of  their  intended  audiences.  For  the  approach  to  be  successful,  the  ideas 
they  are  promoting  must  fit  within  the  cultural  meaning  systems  shared  across  the  population 
they  are  addressing.  Knowing  what  messages  will  effectively  persuade  their  constituents  is  likely 
intuitive  for  terrorists  operating  within  their  own  cultural  environment,  but  not  necessarily  for 
researchers  who  come  from  distinct  cultural  backgrounds.  A  method  is  thus  described  for 
studying  in  detail  the  common  perspective  that  members  of  a  culture  bring  to  a  situation.  The 
method  results  in  models  of  the  culture  that  provide  a  basis  for  outsiders  to  begin  to  frame  events 
from  the  cultural-insider  point  of  view.  The  cultural  models  can  then  be  used  as  an  aid  to 
anticipate  how  messages  will  be  interpreted  and  evaluated  by  terrorists  and  their  audiences. 

Keywords:  Cultural  epidemiology,  mental  models,  political  violence,  terrorist  mind,  jihad,  Islam 

The  purpose  of  this  paper  is  to  describe  an  approach  to  cultural  modelling,  cultural  network 
analysis  (CNA),  and  its  application  to  terrorism  research.  Cultural  network  analysis  builds  on  a 
foundation  of  research  practices  drawn  from  the  fields  of  cognitive  anthropology,  cultural  and 
cognitive  psychology,  and  decision  analysis.  It  improves  upon  current  cultural  research 
techniques  by  providing  a  systematic  method  for  constructing  cultural  models  for  groups, 
organisations,  or  wider  societies.  The  essential  idea  is  that,  by  studying  in  detail  the  common 
perspective  that  members  of  a  culture  bring  to  a  situation,  a  model  of  the  culture  can  be 
constructed  that  provides  a  basis  for  an  outsider  to  begin  to  frame  events  from  their  point  of 
view.  The  model  can  then  be  used  for  a  variety  of  purposes,  such  as  an  aid  to  anticipating  how 
messages  will  be  interpreted  and  evaluated  by  members  of  the  culture.  Cultural  models  derived 
by  CNA  are  represented  graphically  as  a  network  of  the  culturally-shared  concepts,  causal 
beliefs,  and  values  that  influence  key  decisions  in  a  particular  context[l],  In  their  most  fully 
developed  form,  cultural  models  also  convey  detailed  quantitative  information  about  the 
prevalence  of  their  specific  components.  In  order  to  establish  a  context  for  addressing 
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contributions  that  cultural  modelling  can  make  to  terrorism  research,  we  briefly  review  progress 
made  in  understanding  terrorism  more  generally. 

Advances  in  understanding  the  reasons  behind  jihadist  terrorism  have  been  made  in  the  last 
several  years,  though  the  evidential  research  base  remains  thin[2].  Generally,  terrorist  support 
and  recruitment  are  not  due  to  any  single  causal  factor,  but  instead  stem  from  the  interplay 
between  political  aspirations  of  terrorist  groups,  vulnerable  individuals,  employment  of  jihadist 
ideology,  and  wider  social  support  for  terrorism.  These  latter  components  increasingly  depend  on 
a  variety  of  modem  modes  of  communication  that  are  used  to  propagate  the  group  vision  of  the 
world  to  a  broad  set  of  constituents.  The  overall  communication  strategies  of  jihadist  terrorist 
organisations  can  be  generally  characterised  as  to: 

1 .  motivate  ordinary  persons  to  carry  out  terrorist  acts  to  meet  the  organisation’s  objectives; 

2.  exploit  moral  outrage  and  feelings  of  humiliation  based  on  political  events; 

3.  convince  by  means  of  religious  texts  used  on  behalf  of  terror  ideology. 

We  discuss  each  of  these  components  of  terrorist  strategy  in  turn.  First,  with  respect  to  profiles  of 
individuals,  what  research  there  is  indicates  that  suicide  terrorists  have  no  appreciable 
psychopathology  and  are  at  least  as  educated  and  economically  well-off  as  their  surrounding 
populations[3]  Furthermore,  education  does  not  appear  to  be  correlated  with  support  for 
terrorism  Finally,  although  economic  despair  may  provide  a  partial  answer,  it  does  not  offer  a 
complete  explanation^].  Importantly,  individuals  who  are  vulnerable  to  terrorist  recruitment  are 
not  motivated  to  take  part  in  suicide  terrorism  without  some  form  of  ideology  to  guide  them,  as 
well  as  an  overall  organisation  to  support  their  activities[5]. 

The  balance  of  evidence  suggests  that  terrorists  tend  to  be  from  at  least  moderately  religious 
backgrounds.  For  example,  interviews  with  terrorist  recruits  in  Pakistan  indicated  that,  “None 
were  uneducated,  desperately  poor,  simple  minded  or  depressed,”  and  “all  were  deeply 
religious.”  They  believed  that  their  acts  were  “sanctioned  by  the  divinely  revealed  religion  of 
lslam”[6].  Furthermore,  it  also  seems  clear  that  religiosity  is  fostered  as  a  part  of  the 
indoctrination  process  and  those  external  events  can  trigger  greater  attention  to  religion.  For 
example,  Bosnian  Muslims  typically  report  not  considering  religious  affiliation  a  significant  part 
of  identity  until  seemingly  arbitrary  violence  forced  awareness  upon  them[7].  This  is  not  to 
suggest  that  the  root  of  terrorist  motive  is  religion,  only  that  religious  beliefs  and  values  form  an 
important  component  of  jihadist  groups’  descriptions  of  their  world. 

The  second  component  of  jihadist  terrorist  strategy  is  exploitation  of  public  emotional  responses 
to  political  events.  Terrorist  organisations  appear  to  be  quite  sophisticated  in  their  use  of  modern 
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media,  including  use  of  the  World  Wide  Web  to  disseminate  vivid  imagery  of  moral  wrongdoing 
by  Americans  and  other  agents  of  the  West.  Furthermore,  humiliating  and  morally  outrageous 
events  are  not  considered  isolated  or  random,  but  rather  are  interpreted  within  an  overarching 
framework  that  a  unified  Western  strategy  exists  to  promote  a  “war  against  lslam”[8] 

The  third  component  of  terrorist  strategy  is  ensuring  that  recruits  are  so  thoroughly  convinced 
that  they  won’t  consider  backing  out,  let  alone  feel  any  mercy  or  remorse  about  their  actions.  For 
a  suicide  terrorist  in  particular,  this  means  they  will  act  with  no  doubt  about  their  decision  to  die 
in  order  to  kill  others[9].  For  example,  the  fully  indoctrinated  terrorist  has  been  described  as 
being  completely  free  of  any  ambiguity  or  doubt  about  the  mission  or  the  means  to  accomplish  it 
[10].  This  religious  conviction  includes  a  fundamental  belief  that  the  terrorist  knows  the  mind  of 
God.  Such  a  belief  justifies  a  complete  lack  of  tolerance  for  divergent  ideas,  even  of  other 
believers  who  disagree  with  the  terrorist  group  on  specific  issues  (i.e.,  the  true  believer  exists 
apart  from  all  others). 

Each  of  these  strategies  relies  heavily  on  terrorist  communication  of  specific  aspects  from  their 
ideological  framework  to  shape  the  common  perspective  of  their  intended  audiences.  For  the 
approach  to  be  successful,  the  ideas  they  are  promoting  must  fit  within  the  cultural  meaning 
systems  shared  across  the  population  they  are  addressing.  One  application  of  cultural  modelling 
to  terrorism  research  is  to  explicitly  map  out  the  relevant  cultural  meaning  systems  in  order  to 
better  understand  how  and  why  various  messages  appear  to  be  effective  in  influencing  people’s 
attitudes  and  gamering  their  support.  Before  addressing  culture  in  terrorism,  however,  we  first 
need  to  define  culture. 

Concept  of  Culture 

There  is  a  somewhat  natural  tendency  to  talk  about  culture  as  if  it  were  a  concrete,  material 
thing.  It  is  sometimes  described  as  something  people  belong  to,  or  as  an  external  substance  or 
force  that  surrounds  its  members  and  guides  their  behaviour.  Although  it  is  sometimes  difficult 
to  avoid  speaking  in  these  metaphorical  terms,  such  an  ethereal  view  does  not  provide  a  useful 
basis  for  a  technical  definition.  An  alternative  approach  begins  by  defining  culture  in  terms  of  the 
widely  shared  ideas  (such  as  concepts,  values,  and  beliefs)  that  comprise  a  shared  symbolic 
meaning  system  [11],  Within  this  conception,  approximately  equivalent  and  complementary 
learned  meanings  are  maintained  by  a  population,  or  by  identifiable  segments  of  a  population.  In 
this  statement,  ‘approximately  equivalent’  acknowledges  that  no  two  people  within  a  culture 
share  exactly  the  same  ideas,  but  rather  highly-similar  meanings  are  shared  by  most  members  of 
a  society.  The  ‘complementary’  component  refers  to  the  fact  that  sharing  of  specialised 
knowledge  depends  on  status  and  roles  within  a  society  (e  g.  an  imam  and  farmer). 
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Taking  this  conception  a  step  further,  it  is  currently  popular  within  cognitive  science  to  draw  on  a 
disease  metaphor  for  understanding  cultural  ideas,  describing  the  ideas  that  spread  widely 
through  a  population  and  persist  for  substantial  periods  of  time  as  especially  ‘contagious’ [12], 
This  theoretical  framework  is  often  referred  to  as  the  epidemiological  view  of  culture,  drawing 
on  the  general  sense  of  epidemiology  as  describing  and  explaining  the  distributions  of  any 
property  within  a  population.  The  starting  point  for  working  from  this  epidemiological  view  is 
the  individual  idea  as  an  atomic  unit.  People  typically  use  the  word  idea  to  refer  to  any  content 
of  the  mind,  including  conceptions  of  how  things  are  and  of  how  things  should  be.  For  instance, 
individuals  may  hold  the  concept  that  Western  nations  are  joined  together  in  a  covert  war  against 
Islam  Their  minds  may  also  contain  the  value  that  imported  Western  ideals,  such  as  the 
separation  of  religious  and  state  affairs,  are  generally  bad  and  so  should  be  avoided.  Ideas  are 
often  treated  as  independent  units  by  social  scientists,  or  grouped  together  into  categories  of 
belief  for  simplicity.  A  key  premise  of  the  current  approach  is  that  cultural  knowledge  consists 
of  shared  networks  of  ideas,  and  that  there  is  value  in  explicitly  considering  clusters  of  ideas  and 
their  interrelationships.  Networks  of  causally-interconnected  ideas  are  often  referred  to  as  folk 
theories  or  mental  models  [1 3],  Such  networks  constitute  people’s  explanations  for  how  things 
work,  and  result  in  judgments  and  decisions  that  influence  their  behaviour. 

From  this  perspective  culture  refers  to  mental  models,  and  other  contents  of  the  mind,  for  which 
there  is  some  level  of  concordance  across  members  of  a  population  over  a  period  of  time.  A 
potential  issue  associated  with  this  definition  of  culture  is  how,  then,  to  define  the  population  of 
interest.  The  term  cultural  group  refers  to  a  population  or  sub-population  of  people  that  largely 
share  the  interconnected  ideas  of  interest.  The  issue  is  that  cultural  groups  are  distinct  from,  but 
related  to,  demographic  groups  (i.e.  groups  based  on  nationality,  educational  status,  etc.)  in  that 
the  demographic  delineations  relevant  to  a  particular  cultural  group  will  depend  on  how 
widespread  the  cultural  ideas  of  interest  are.  For  example,  Sunni  and  Shia  sectarian  distinctions 
make  little  difference  if  the  idea  of  interest  is,  “There  is  no  god  but  Allah,  and  Mohammad  is  his 
prophet.”  However,  if  the  relevant  common  beliefs  include  those  pertaining  to  the  13th  Imam, 
then  that  demographic  does  become  important.  Hence,  the  relevant  cultural  group  for  a  study 
will  depend  on  the  cultural  domain ,  that  is,  the  kind  and  topic  of  knowledge  of  interest. 

Sunni  Jihadist  Cultural  Model 

Consider  a  Sunni  Muslim  extremist  conception  of  socio-political  relationships  between  Islam 
and  the  West.  A  mental  model  of  such  relationships  contains  an  individual  person’s  concepts  as 
well  as  their  understanding  of  the  causal  relationships  between  concepts,  i.e.  the  antecedents  and 
consequences  of  political  activities  and  their  outcomes.  This  mental  model  influences  the 
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individual’s  expectations  for  how  socio-political  relationships  will  unfold  and  provides  a 
framework  for  selecting  behaviours  and  goals  within  this  context.  Figure  1  provides  a  network 
representation  that  might  describe  a  Sunni  Muslim’s  mental  model  of  current  political  events. 
The  set  of  ideas  represented  in  Figure  1  were  extracted  from  articles  that  describe  jihadist 
narratives,  and  is  presented  here  for  illustrative  purposes[14]  [15]  Figure  1  depicts  a  number  of 
ideas  using  circles,  lines,  and  colour.  These  ideas  include  simple  concepts  such  as  “Western 
arrogance”  and  “Muslim  honour”  represented  as  circles.  It  also  includes  causal  ideas,  such  as 
that  development  of  a  new  Islamic  caliphate  would  decrease  the  extent  of  Western  dominance 
and  bring  about  a  return  of  past  Islamic  glory.  These  are  represented  as  lines  in  the  figure,  with 
+/-  indicating  the  direction  of  the  causal  belief  Finally,  Figure  1  portrays  ideas  of  desired  states 
or  value  using  colour,  as  well  as  a  logical  flow  across  desired  states.  Developing  an  Islamic 
caliphate  is  a  good  thing.  Maintaining  (and  enhancing)  Muslim  honour  is  likewise  valued. 


Figure  1.  Sunni  jihadist  cultural  model  of  political  relationships 
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According  to  the  model,  jihad  is  viewed  positively  and  should  be  supported  by  the  model’s 
adherents  due  to  the  perceived  anticipated  consequences  for  Muslims.  Most  directly,  support  for 
jihad  decreases  the  chances  that  the  West  will  continue  its  war  against  Islam,  and  enhances 
collective  Muslim  honour.  Holding  the  beliefs  described  by  this  mental  model  is  likely  to  have 
fairly  strong  consequences  for  how  a  person  will  decide  and  act  in  a  number  of  specific,  relevant 
situations. 

As  implied  by  the  name,  mental  models  reside  inside  the  heads  of  individuals.  However,  when 
people  communicate  with  each  other  in  any  variety  of  modes,  they  develop  mental  models  that 
may  begin  to  resemble  one  another.  Mental  models  can  spread  widely  throughout  a  population, 
becoming  ‘cultural’  in  the  sense  of  being  shared  by  many  of  its  members.  A  cultural  model 
refers  to  an  external  representation  of  a  set  of  culturally-shared  mental  models  that  is  constructed 
by  a  researcher.  A  cultural  model  represents  a  consensus  of  the  mental  models  for  a  particular 
cultural  group  and  domain.  Hence,  for  the  Sunni  Muslims  who  hold  beliefs  similar  to  the 
elements  in  this  model,  Figure  1  serves  as  one  of  their  cultural  models  in  the  domain  of  socio¬ 
political  relationships. 

Considering  Figure  1  as  a  cultural  model  gives  us  a  precise  way  of  identifying  cultural 
transmission  and  cultural  change  [16],  For  example,  suppose  the  prospect  of  return  to  a  glorious 
Islamic  civilisation  is  the  most  salient  perceived  outcome  that  is  positively  influenced  by  the 
concept  of  supporting  jihad.  A  change  in  the  causal  belief  chain  so  that  jihad  in  the  present 
situation  is  seen  instead  as  decreasing  the  chances  of  a  glorious  Islamic  revival  could  affect  a 
change  in  the  value  (or  attitude)  associated  with  acts  that  support  jihad.  That  is,  we  might 
observe  a  change  in  the  overall  cultural  model  resulting  from  this  shift  in  the  specific  causal 
chain  of  beliefs  that  link  jihad  to  Islamic  glory.  Such  an  attitude  change  might  then  result  in  a  re¬ 
examination  and  reinterpretation  of  Islamic  texts,  or  at  least  the  salience  of  such  messages.  This 
example  highlights  the  interrelation  between  causal  beliefs  and  values,  in  addition  to  illustrating 
how  cultural  models  can  represent  cultural  transmission. 

Cultural  Values,  Models  and  Domains 

Cultural  psychologists  have  often  conceptualised  culture  in  terms  of  lists  of  domain-general, 
stable  traits,  such  as  individualist-collectivist  value  orientations  [17],  Researchers  operating 
within  this  programme  aim  to  find  a  core  set  of  dimensions  for  characterising  cultures  that  they 
believe  to  be  important  across  a  wide  variety  of  domains.  The  idea  is  to  provide  purely 
analytical  predictions,  a  priori,  about  cultural  groups  that  are  widely  applicable  to  many 
particular  problems.  For  example,  cultural  researchers  from  this  perspective  might  attempt  to 
understand  popular  support  for  jihad  in  Middle  Eastern  countries  by  considering  the  general  level 
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of  disparity  of  power  held  by  members  of  those  societies.  An  important  assumption  about 
culturally-shared  mental  models,  in  contrast,  is  that  they  are  highly  specific  to  particular  domains 
[18].  That  is,  activities  such  as  participation  in  a  rally  for  Hezbollah  are  supported  by  mental 
models  that  are  tailored  to  those  specific  activities  Hence  the  culturally-shared  mental  models 
comprise  values,  beliefs,  and  concepts  that  are  salient  to  members  of  a  particular  culture  in 
particular  contexts,  and  may  well  not  generalise  to  other  situations.  Multiple  cultural  values  are 
reflected  in  people’s  mental  models,  and  certain  values  may  be  more  important  than  others 
depending  upon  the  situation,  a  phenomenon  sometimes  known  as  value  trumping  [19],  For 
example,  Americans  typically  place  a  high  value  on  freedom  of  speech;  however,  they  may  also 
support  censorship  or  restricted  access  to  information  at  certain  times  (e  g.,  extremely  violent  or 
sexually-explicit  content).  Hence,  from  the  cultural  models  perspective  it  is  difficult  to 
understand  the  cultural  considerations  that  are  relevant  within  a  particular  context  by  starting 
with  pre-existing  lists  of  “domain  general”  cultural  values.  This  suggests  that  it  is  preferable  to 
begin  cultural  analysis  of  a  new  domain  in  a  more  exploratory  fashion,  allowing  values  to 
emerge  from  the  analysis  along  with  their  related  cultural  concepts  and  causal  beliefs  [20], 

Mental  models  are  naturally  domain  specific  because  they  are  explanations  of  the  workings  of 
particular  artefacts  and  natural  processes.  Furthermore,  mental  models  can  vary  across  cultures  in 
ways  that  are  constrained  only  by  the  domain  itself  and  any  cognitive  universal  that  ground 
shared  understanding  across  humanity  [21].  Most  work  on  mental  models  has  focused  on  the 
physical  domain,  though  people  also  possess  mental  models  that  pertain  to  the  psychological  and 
social  domains,  as  exemplified  in  Figure  1  [22],  A  cultural  model  represents  a  consensus  of 
mental  models  within  the  context  of  a  particular  domain. 

One  specific  approach  to  cultural  modelling  begins  by  identifying  the  judgements  or  decisions  of 
primary  interest  for  study,  such  as  a  decision  to  engage  in  suicide  terrorism.  The  decisions  chosen 
arise  in  specific  contexts  as  defined  by  critical  incidents  or  scenarios.  They  are  made  by  members 
of  the  cultural  group  being  investigated,  typically  in  a  way  that  is  surprising  or  confusing  to 
members  outside  the  group.  Once  the  key  decisions  are  identified,  investigators  build  models  of 
the  cultural  ideas  that  directly  influence  those  decisions.  This  approach,  called  “cultural  network 
analysis”  ensures  that  the  aspects  of  culture  investigated  are  relevant  to  the  decisions  of  interest. 

Cultural  Network  Analysis 

Cultural  network  analysis  is  a  method  for  describing  ideas  that  are  shared  by  members  of  cultural 
groups,  and  relevant  to  decisions  within  a  defined  situation  [23],  CNA  discriminates  between 
three  kinds  of  ideas:  concepts,  values,  and  beliefs  about  causal  relations.  The  cultural  models 
resulting  from  CNA  use  network  diagrams  to  show  how  all  the  ideas  relate  to  one  another.  The 
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CNA  approach  also  includes  the  full  set  of  techniques  needed  to  build  cultural  model  diagrams. 
This  consists  of  specific  methods  to  elicit  the  three  kinds  of  ideas  from  people  in  interviews  or 
survey  instruments,  extract  the  ideas  from  interview  transcripts  or  other  texts,  analyse  how 
common  the  ideas  are  between  and  within  cultural  groups,  and  align  and  assemble  the  common 
ideas  into  complete  maps.  CNA  shares  aspects  with  other  approaches  to  cultural  analysis, 
especially  cognitive  approaches  developed  by  anthropologists  [24],  However,  it  offers  some 
specific  aspects  as  a  complete  method  that  distinguishes  it  from  other  ways  of  examining 
cultures.  These  aspects  include  an  emphasis  on  ensuring  relevance  of  cultural  models  to  key 
decisions  to  provide  a  more  direct  link  to  actual  behaviour,  portrayal  of  the  cultural  insider  or 
‘emic’  perspective,  modelling  interrelated  networks  of  ideas  rather  than  treating  ideas  as 
independent  entities,  and  by  seeking  to  directly  estimate  the  actual  prevalence  of  ideas  in  the 
network  rather  than  relying  on  more  vague  notions  of  sharedness. 

Cultural  Network  Analysis  comprises  an  exploratory  phase  and  a  confirmatory  phase.  In  the 
exploratory  phase,  concepts  and  mental  models  are  extracted  from  qualitative  sources,  such  as 
interviews  and  open  source  media  (web  news,  blogs,  email),  with  little  presupposition  regarding 
the  elicited  contents.  One  goal  of  this  phase  is  to  develop  an  initial  understanding  of  the  concepts 
and  characteristics  that  are  culturally  relevant  within  the  domain.  A  second  objective  is  to  obtain 
initial  graphical  representations  of  people’s  mental  models  in  forms  that  closely  match  their  own 
natural  representational  structure.  Qualitative  analysis  and  representation  at  this  stage  yield 
insights  that  can  be  captured  in  initial  cultural  models.  Often,  qualitative  analysis  may  be  all  that 
is  needed  for  applications.  The  exploratory  phase  also  generates  a  wealth  of  material  for 
constructing  subsequent  structured  data  collection  in  a  confirmatory  phase.  In  the  confirmatory 
phase  of  CNA,  structured  interviews,  field  experiments,  and  automated  semantic  mining  of  web- 
based  sources  are  used  to  obtain  systematic  data  that  is  more  amenable  to  statistical  analysis. 
Statistical  models  used  by  cognitive  anthropologists  and  market  researchers  are  employed  to 
assess  the  patterns  of  agreement  and  derive  statistics  describing  the  distribution  of  concepts, 
causal  beliefs,  and  values.  Finally,  formal  representations  of  the  cultural  models  are  constructed 
that  illustrate  the  statistical  and  qualitative  information  in  diagrams.  Influence  diagrams  are  an 
important  representation  format  for  cultural  models,  as  illustrated  in  Figure  1.  Formal 
representation  makes  it  possible  to  use  cultural  models  in  a  variety  of  applied  contexts. 

Cultural  Models  and  Terrorist  Cognition 

Cultural  modelling  and  the  epidemiological  view  of  culture  can  help  to  further  understand  the 
shared  cognition  of  terrorists  and  their  audiences.  From  the  epidemiological  view,  culture  is 
made  up  of  contagious  ideas,  that  is,  ideas  that  propagate  effectively  within  a  population  [25], 
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Two  broad  objectives  of  research  from  this  cultural  epidemiology  viewpoint  are  to  characterise 
the  current  distribution  of  mental  models  within  cultural  groups,  and  to  understand  the  dynamics 
of  culture. 

Fundamental  cultural  research  programs  from  this  perspective  seek  to  address  why  some  ideas 
are  more  infectious  than  others,  and  to  explain  the  most  widely  distributed  and  long-lasting  ideas 
within  a  population.  Research  for  practical  purposes  has  a  slightly  different  focus.  From  a 
decision-making  standpoint,  for  example,  we  recognise  that  many  ideas  may  be  pervasive  but 
inconsequential  to  decisions  of  practical  interest  [26],  FTence,  a  decision-centred  approach  to 
culture  and  cognition  begins  with  critical  judgements  and  decisions  that  are  made  by  members  of 
a  cultural  group.  For  example,  we  conceive  of  the  decision  to  accept  the  terrorist  group’s 
worldview  as  the  central  node  within  the  highest-level  of  a  hierarchy  of  terrorist  cultural  models. 
Using  Cultural  Network  Analysis,  we  can  study  the  networks  of  causally-interconnected  ideas 
that  are  relevant  to  those  decisions  in  order  to  answer  a  host  of  questions,  such  as: 

1.  What  is  the  distribution  of  mental  models  shared  among  particular  terrorist  groups  and 
their  potential  supporters? 

2.  How  did  the  distribution  get  to  be  that  way? 

3.  How  stable  are  those  distributions? 

4.  In  what  ways  are  the  distributions  changing  over  time? 

5.  How  do  individual  ideas  influence  one  another  in  these  cultural  belief  networks? 

Resulting  cultural  models  and  descriptions  of  their  dynamics  from  such  studies  can  provide 
considerable  insight  into  the  thinking  behind  communications  that  stem  from  terrorist  groups. 
They  also  provide  a  basis  for  developing  effective  counter-communications  by  aiding  in  the 
determination  of  what  makes  for  culturally  meaningful  messages.  Cultural  models  would  allow 
for  making  predictions  concerning  the  effectiveness  of  a  message  by  providing  the  opportunity  to 
assess  potential  unintended  inferences  that  individuals  with  a  certain  knowledge  structure  might 
make.  Specifically,  in  a  cultural  models  diagram,  each  concept  and  causal  belief  represents  an 
opportunity  to  effect  a  change  in  beliefs  or  concepts.  Hence,  such  diagrams  can  provide  an 
orderly  basis  for  determining  the  content  of  communications.  Messages  are  created  so  as  to 
affect  the  values  of  the  most  vulnerable  concept  nodes  (i.e.,  those  for  which  there  is  the  least 
consensus)  which  then  propagate  across  perceived  influences  to  affect  the  values  of  other 
concepts.  These  effects  spread  through  the  cultural  belief  network,  ultimately  changing  the  value 
in  overall  perceptions  or  cognitions.  With  this  CNA  approach,  information  efforts  focus  on 
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transmitting  the  most  relevant  information  to  effect  conceptual  change  in  a  way  that  makes  sense 
within  the  cultural  group’s  understanding. 

If  the  cultural  group’s  understanding  is  mapped  out  in  this  way  using  their  culturally  relevant 
concepts  and  causal  beliefs,  then  it  can  be  relatively  straightforward  to  identify  critical  concepts 
for  targeting  messages.  Pursuing  this  strategy  requires  the  following  steps: 

*  create  a  cultural  model  relevant  to  the  action  or  belief  of  interest; 

*  obtain  relevant  quantitative  estimates  of  parameters  in  the  model; 

*  simulate  the  cultural  change  effects  of  changes  to  detail-level  concept  values; 

*  identify  the  most  vulnerable  concepts  and  concept  values  as  those  for  which  the 
most  disagreement  exists; 

*  compose  messages  to  affect  the  values  of  those  concepts. 

In  sum,  the  results  of  CAN  studies  can  provide  valuable  input  to  the  development  of  accurate 
models  of  terrorist  decision  making,  as  well  as  for  the  cognitive  characterisation  of  groups  based 
on  their  ideological  commitments.  A  critical  aspect  of  establishing  an  environment  unfavourable 
to  extremist  ideas  is  to  begin  to  take  apart  the  rhetoric  of  terror-sponsoring  organisations,  and 
address  their  ideologies  through  communication  [27]  In  doing  this,  we  may  find  ways  to  remove 
the  appeal  of  religious-inspired  myths  of  terrorist  acts  as  the  glorious  correction  of  moral 
wrongdoing  [28], 

Acknowledgements 

This  paper  was  supported  by  Contract  N00014-10-C-0078  from  the  Office  of  Naval  Research. 
The  author  thanks  Louise  Rasmussen  and  Paul  Smart  for  fruitful  discussions  on  these  topics,  and 
two  anonymous  referees  for  helpful  comments  on  an  earlier  version  of  the  paper. 

Author  Biography 

Winston  R.  Sieck,  PhD,  is  a  principal  scientist  at  Applied  Research  Associates,  where  he  leads 
the  Culture  and  Cognition  Group  He  conducts  fundamental  and  applied  research  on  culture  and 
decision  making,  including  topics  such  as  terrorist  cognition,  ideological  conviction,  intercultural 
understanding  and  cross-cultural  communication.  He  received  a  PhD  in  Psychology  with 
emphasis  on  cognition,  culture,  and  decision  making  from  the  University  of  Michigan  in  2000. 
Email:  wsieck@ara.com 

Notes 

[1]  Sieck,  W.  R.,  L.  J.  Rasmussen,  et  al.  (2010).  Cultural  network  analysis:  A  cognitive  approach  to  cultural  modeling.  Network  Science  for 
Mililary-Coalilion  Operations;  Information  Extraction  and  Interaction-  D.  Verma.  Hershey,  PA,  IG1  Global:  237-255. 


12 


Approved  for  Public  Release  (Distribution  is  Unlimited) 


A-l  1 


Cognitive  Solutions  Division 
Applied  Research  Associates,  Inc. 


Prime  Contract  No.:  N00014-10-C-0078 


Final  Report 


Journal  of  Terrorism  Research 


Volume  2,  Issue  1 


[2]  Atran.  S.  and  M.  Sageman  (2006).  "Connecting  the  dots."  Bulletin  of  the  Atomic  Scientists  62(4):  68. 

[3]  Atran,  S.  (2003).  "Genesis  of  suicide  terrorism."  Science  299:  1534-1539. 

[4]  Barsalou,  J.  (2002).  "Islamic  extremists:  How  do  they  mobilize  support?'  United  States  Institute  of  Peace  Special  Report(89):  1-8. 

[5]  Speckhard,  A.  (2006).  Sacred  terror:  Insights  into  the  psychology  of  religiously  motivated  terrorism.  Faith-based  radicalism:  Christianity. 

Islam  and  Judaism  between  constructive  activism  and  destructive  fanaticism-  C.  Timmerman,  D.  Hutsebaut,  S.  Mells.  W.  Nonneman  and  W.  V. 
Herck.  Antwerp,  Belgium,  UCSIA. 

[6]  Hassan,  N.  (2001).  "An  arsenal  of  believers."  The  New  Yorker  November  19:  36-41. 

[7]  Atran,  S.  (2003).  "Genesis  of  suicide  terrorism."  Science  299:  1534-1539. 

[8]  Sageman,  M.  (2008).  "The  next  generation  of  terror."  Foreign  Policy  March/Apnl:  37-42. 

[9]  Speckhard.  A.  (2006).  Sacred  terror:  Insights  into  the  psychology  of  religiously  motivated  terrorism.  Faith-based  radicalism:  Christianity. 
Islam  and  Judaism  bctw^n  constructive  activism  and  destructive  fanaticism-  C.  Timmerman,  D.  Hutsebaut,  S.  Mells,  W.  Nonneman  and  W.  V. 
Herck.  Antwerp,  Belgium,  UCSIA. 

[10]  Juergensmever,  M.  (2000).  Terror  in  the  mind  of  God:  The  global  rise  of  religious  violence.  Berkeley,  CA,  University  of  California  Press. 

[1 1]  Rohner.  R.  P.  (1984).  "Toward  a  conception  of  culture  for  cross-cultural  psychology."  Journal  of  Cross-Cultural  Psychology  15(2):  111-138. 

[12]  Sperber,  D.  (1996).  Explaining  culture:  A  naturalistic  approach.  Malden,  MA,  Blackwell. 

[13]  Gentner,  D.  and  A.  L.  Stevens  (1983).  Mental  Models.  Hillsdale.  NJ,  Lawrence  Erlbaum  Associates. 

[14]  Hafez,  M.  M.  (2007).  "Martyrdom  mythology  in  Iraq:  How  jihadists  frame  suicide  terrorism  in  videos  and  biographies."  Terrorism  and 
Political  Violence  19:  95-115. 

[15]  Sageman,  M.  (2008).  "The  next  generation  of  terror."  Foreign  Policy  March/April:  37-42. 

[16]  Norenzavan,  A.  and  S.  Atran  (2004).  Cultural  transmission  of  natural  and  nonnatural  beliefs.  The  psychological  foundations  of  culture.  M. 
Schaller  and  C.  Crandall.  Hillsdale,  NJ,  Lawrence  Erlbaum  Associates,  Inc. 

[17]  Hofstede,  G.  (2001).  Culture's  consequences-  Thousand  Oaks,  CA,  Sage. 

[18]  Hirschfeld,  L.  and  S.  Gelman.  Eds.  (1994).  Mapping  the  mind:  Domain  specificity  in  cognition  and  culture.  New  York.  Cambridge 
University. 

[19]  Osland.  J.  S.  and  A.  Bird  (2000).  "Beyond  Sophisticated  Stereotyping:  Cultural  Sensemaking  in  Context."  Academy  of  Management 
Executive  14(1):  65-79. 

[20]  Sieck.  W.  R..  A.  P.  Grome.  et  al.  (2010).  Expert  cultural  sensemaking  in  the  management  of  Middle  Eastern  crowds.  Informed  by 
Knowledge:  Expert  Performance  in  Complex  Situations.  K.  L.  Mosier  and  U.  M.  Fischer,  Taylor  and  Francis. 

[21]  Hirschfeld,  L.  and  S.  Gelman,  Eds.  (1994).  Mapping  the  mind:  Domain  specificity  in  cognition  and  culture.  New  York,  Cambridge 
University. 

[22]  McHugh,  A.  P.,  J.  L.  Smith,  et  al.  (2008).  Cultural  variations  in  mental  models  of  collaborative  decision  making.  Naturalistic  Decision 
Making  and  Macrocognition.  J.  M.  C.  Schraagen,  L.  Militello,  T.  Ormerod  and  R.  Lipshitz.  Aldershot.  UK,  Ashgate  Publishing  Limited:  141-158. 

[23]  Sieck.  W.  R.  (2010).  Cultural  network  analysis:  Method  and  application.  Advances  in  Cross-Cultural  Decision  Making.  D.  Schmorrow  and 
D.  Nicholson.  Boca  Raton,  CRC  Press  /  Taylor  &  Francis,  Ltd:  260-269. 

[24]  D' Andrade,  R.  G.  (1981).  "The  cultural  part  of  cognition."  Cognitive  Science  5:  1 79-195. 

[25]  Sperber.  D.  (1996).  Explaining  culture:  A  naturalistic  approach.  Malden,  MA,  Blackwell. 

[26]  Bostrom,  A.,  B.  Fischhoff,  et  al.  (1992).  "Characterizing  mental  models  of  hazardous  processes:  A  methodology  and  an  application  to  radon." 
Journal  of  Social  Issues  48(4):  85-100. 

[27]  Speckhard,  A.  (2006).  Sacred  terror:  Insights  into  the  psychology  of  religiously  motivated  terrorism.  Faith-based  radicalism:  Christianity. 
Islam  and  Judaism  between  constructive  activism  and  destructive  fanaticism-  C.  Timmerman,  D.  Hutsebaut.  S.  Mells.  W.  Nonneman  and  W.  V. 
Herck.  Antwerp,  Belgium,  UCSIA. 

[28]  Sageman.  M.  (2008).  "The  next  generation  of  terror."  Foreign  Policy  March/April:  37-42. 


13 


Approved  for  Public  Release  (Distribution  is  Unlimited) 


A-12 


Cognitive  Solutions  Division 
Applied  Research  Associates,  Inc. 


Final  Report 

Prime  Contract  No.:  N00014-10-C-0078 


Journal  of  Terrorism  Research 


Volume  2,  Issue  1 


Bibliography 

Atran,  S.  (2003).  "Genesis  of  suicide  terrorism."  Science  299:  1534-1539. 

Atran,  S.  and  M.  Sageman  (2006).  "Connecting  the  dots."  Bulletin  of  the  Atomic  Scientists  62 
(4):  68. 

Barsalou,  J.  (2002).  "Islamic  extremists:  How  do  they  mobilize  support?"  United  States  Institute 
of  Peace  Special  Report  (891:  1-8. 

Bostrom,  A.,  B.  Fischhoff,  et  al.  (1992).  "Characterizing  mental  models  of  hazardous  processes: 
A  methodology  and  an  application  to  radon."  Journal  of  Social  Issues  48(4):  85-100. 

D'Andrade,  R.  G.  (1981).  "The  cultural  part  of  cognition."  Cognitive  Science  5:  179-195. 

Gentner,  D  and  A.  L.  Stevens  (1983)  Mental  Models.  Hillsdale,  NJ,  Lawrence  Erlbaum 
Associates. 

Hafez,  M.  M.  (2007).  "Martyrdom  mythology  in  Iraq:  How  jihadists  frame  suicide  terrorism  in 
videos  and  biographies."  Terrorism  and  Political  Violence  19:  95-1 15. 

Hassan,  N.  (2001).  "An  arsenal  of  believers."  The  New  Yorker  November  19:  36-41. 

Hirschfeld,  L.  and  S.  Gelman,  Eds.  (1994).  Mapping  the  mind:  Domain  specificity  in  cognition 
and  culture  New  York,  Cambridge  University. 

Hofstede,  G.  (2001).  Culture's  consequences.  Thousand  Oaks,  CA,  Sage. 

Juergensmeyer,  M.  (2000).  Terror  in  the  mind  of  God:  The  global  rise  of  religious  violence. 
Berkeley,  CA,  University  of  California  Press. 

McHugh,  A.  P,  J.  L.  Smith,  et  al.  (2008).  Cultural  variations  in  mental  models  of  collaborative 
decision  making.  Naturalistic  Decision  Making  and  Macrocognition.  J.  M.  C.  Schraagen,  L. 
Militello,  T.  Ormerod  and  R  Lipshitz.  Aldershot,  UK,  Ashgate  Publishing  Limited:  141-158 

Norenzayan,  A.  and  S.  Atran  (2004).  Cultural  transmission  of  natural  and  nonnatural  beliefs.  The 
psychological  foundations  of  culture.  M.  Schaller  and  C.  Crandall.  Hillsdale,  NJ,  Lawrence 
Erlbaum  Associates,  Inc. 

Osland,  J.  S.  and  A.  Bird  (2000).  "Beyond  Sophisticated  Stereotyping:  Cultural  Sensemaking  in 
Context."  Academy  of  Management  Executive  14(1):  65-79. 

Rohner,  R  P.  (1984).  "Toward  a  conception  of  culture  for  cross-cultural  psychology."  Journal  of 
Cross-Cultural  Psychology  15(21:  111-138. 

Sageman,  M.  (2008).  "The  next  generation  of  terror."  Foreign  Policy  March/April:  37-42. 


14 


Approved  for  Public  Release  (Distribution  is  Unlimited) 


A-13 


Cognitive  Solutions  Division 
Applied  Research  Associates,  Inc. 


Final  Report 

Prime  Contract  No.:  N00014-10-C-0078 


Journal  of  Terrorism  Research 


Volume  2,  Issue  1 


Sieck,  W  R.  (2010).  Cultural  network  analysis:  Method  and  application  Advances  in  Cross- 
Cultural  Decision  Making.  D.  Schmorrow  and  D.  Nicholson.  Boca  Raton,  CRC  Press  /  Taylor  & 
Francis,  Ltd:  260-269. 

Sieck,  W.  R.,  A.  P.  Grome,  et  al.  (2010).  Expert  cultural  sensemaking  in  the  management  of 
Middle  Eastern  crowds.  Informed  by  Knowledge:  Expert  Performance  in  Complex  Situations. 

K.  L.  Mosier  and  U.  M.  Fischer,  Taylor  and  Francis. 

Sieck,  W.  R  ,  L.  J.  Rasmussen,  et  al.  (2010).  Cultural  network  analysis:  A  cognitive  approach  to 
cultural  modeling.  Network  Science  for  Military  Coalition  Operations:  Information  Extraction 
and  Interaction.  D  Verma.  Hershey,  PA,  1GI  Global:  237-255. 

Speckhard,  A.  (2006).  Sacred  terror:  Insights  into  the  psychology  of  religiously  motivated 
terrorism.  Faith-based  radicalism:  Christianity.  Islam  and  Judaism  between  constructive  activism 
and  destructive  fanaticism.  C.  Timmerman,  D  Hutsebaut,  S.  Mells,  W  Nonneman  and  W  V. 
Herck.  Antwerp,  Belgium,  UCSIA. 

Sperber,  D.  (1996).  Explaining  culture:  A  naturalistic  approach.  Malden,  MA,  Blackwell. 


15 


Approved  for  Public  Release  (Distribution  is  Unlimited) 


A-14 


Cognitive  Solutions  Division 
Applied  Research  Associates,  Inc. 


Final  Report 

Prime  Contract  No.:  N00014-10-C-0078 


APPENDIX  B.  ACM  MEDES  PAPER 

Penta,  A.,  Shadbolt,  N.  R.,  Smart,  P.  R.,  &  Sieck,  W.  R.  (2011,  Nov.).  Detection  of  cognitive 
features  from  web  resources  in  support  of  cultural  modeling  and  analysis.  Paper 
presented  at  ACM  Management  of  Emergent  Digital  Ecosystems,  San  Francisco,  CA. 


Approved  for  Public  Release  (Distribution  is  Unlimited) 


B-l 


Cognitive  Solutions  Division 
Applied  Research  Associates,  Inc. 


Final  Report 

Prime  Contract  No.:  N00014-10-C-0078 


Detection  of  Cognitive  Features  from  Web  Resources  in 
Support  of  Cultural  Modeling  and  Analysis 


Antonio  Penta,  Nigel  Shadbolt,  Paul  Smart 
School  of  Electronics  and  Computer  Science 
University  of  Southampton, 
Southampton  SOI  7  1BJ,  UK 
{ap7,nrs,ps02v}@ecs.soton. ac.uk 

ABSTRACT 

The  World  Wide  Web  serves  as  a  valuable  source  of  culture¬ 
relevant.  information,  which  can  be  used  to  support  cultural 
modeling  and  analysis  activities.  Part  of  the  challenge  in  ex¬ 
ploiting  the  Web  as  a  source  of  culture-relevant  information 
relates  to  the  need  to  detect  and  extract  information  about 
beliefs,  attitudes,  and  values  from  a  variety  of  different  re¬ 
sources.  The  Web,  thus,  features  a  rich  variety  of  informa¬ 
tion  resources,  and  these  are  seldom  categorized  with  respect 
to  tine  dimensions  in  which  cultural  analysts  are  interested. 
Exploiting  the  Web  as  a  source  of  culture-relevant  infor¬ 
mation  therefore  requires  techniques  and  approaches  that 
enable  cultural  analysts  to  extract  relevant  information  and 
organize  extracted  content  in  various  ways.  In  this  paper, 
we  outline  an  approach  to  assist  cultural  analysts  in  the  ex¬ 
traction  and  organization  of  relevant  information.  We  show 
techniques  that  can  be  used  to  extract  information  of  the 
attitudes,  beliefs,  and  values  of  individuals,  and  how  this 
data  can,  in  turn,  be  used  to  support  cultural  modeling  and 
analysis. 

Categories  and  Subject  Descriptors 

H  [Information  Systems]:  Social  Computing,  Cultural  Mod¬ 
elling,  Cognitive  Features  Detection ;  H.3  [Information  Search 
and  Retrieval  ] 

1.  INTRODUCTION 

The  World  Wide  Web  (WWW)  serves  as  a  valuable  source 
of  culture-relevant  information,  which  can  be  used  to  sup¬ 
port  a  number  of  cult  ural  modeling  and  analysis  activities. 

A  number  of  factors,  however,  militate  against  the  widespread 
use  of  the  Web  in  cultural  analysis  contexts.  One  difficulty 
relates  to  the  fact  that  Web  content  is  seldom  represented 
and  organized  in  ways  that  support  cultural  modeling  and 
analysis.  If  cultural  analysts  therefore  wish  to  test  specific 
hypotheses  regarding  the  distribution  of  beliefs,  values  and 
attitudes  (what  we  collectively  refer  to  as  u Cognitive  Fea¬ 
tures'*)  among  different  groups,  they  are  often  prevented 
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from  doing  so  in  a  Web  context  because  the  data  is  sim¬ 
ply  not  available  in  the  right  format.  Typically,  culture- 
relevant  information  is  embedded  in  resources  containing 
other  kinds  of  content,  and  this  makes  systematic  forms  of 
data  analysis  highly  problematic.  Ideally,  what  is  required 
are  representational  schemes  that  enable  cultural  analysts 
to  flexibly  manipulate  data  in  ways  that  support  hypothesis 
testing  and  theory  development.  A  second,  not  altogether 
unrelated  concern,  associated  with  the  use  of  the  Web  as 
a  source  of  culture-relevant  information  relates  to  the  fact 
that  relevant  data  is  often  not  explicitly  represented  in  the 
target  resources.  For  example,  if  we  are  looking  for  evi¬ 
dence  of  particular  Cognitive  Features  in  natural  language 
resources,  then  we  will  often  have  to  analyze  the  meaning 
of  the  source  text;  seldom  will  target  Cognitive  Features  be 
represented  in  such  a  way  that  they  can  be  easily  detected 
by  automated  processing  techniques.  In  light  of  these  diffi¬ 
culties,  it  is  important  to  develop  a  range  of  information  ex¬ 
traction,  representation  and  manipulation  capabilities.  Such 
capabilities  need  to  be  flexible  enough  to  extract  a  range  of 
Cognitive  Features,  and  they  need  to  be  sensitive  enough  to 
detect  those  features  even  when  the  target  features  are  “hid¬ 
den”  in  natural  language  texts  (a  problem  that  is  akin  to  the 
detection  of  weak  signals  in  a  lot  of  background  noise).  Fi¬ 
nally,  information  manipulation  capabilities  are  required  to 
support  hypothesis  testing  and  cultural  modeling  activities. 
The  development  of  these  capabilities  will  support  the  use 
of  the  Web  as  a  resource  for  cultural  analysis  and  cultural 
model  development.  In  this  context,  the  aims  of  the  paper 
are:  i)  to  propose  a  general  framework  that  can  be  used  to 
support,  the  detection,  extraction  and  representation  of  Cog¬ 
nitive  Features;  ii)  to  show  how  we  can  use  statistical  tech¬ 
niques  to  implement  the  proposed  framework;  iii)  to  show 
in  a  preliminary  case  study  how  the  Cognitive  Features  can 
be  useful  patterns  to  detect  members  belonging  to  an  ex¬ 
treme  religious  domain.  To  the  best  of  our  knowledge,  ours 
is  the  first  attempt  to  propose  a  computational  approach  to 
address  the  cognitive  models  by  a  cultural  framework.  An 
interesting  survey  of  cultural  influence  in  social  behaviour  is 
presented  in  [9].  Obviously,  there  is  a  rich  literature  concern¬ 
ing  the  extraction  of  particular  bodies  of  information  from 
Web-based  sources  [1];  however,  most  of  the  techniques  that 
are  described  in  the  information  extraction  literature  focus 
their  attention  on  extraction  algorithms  while  ignoring  the 
specification  and  selection  of  features  that  can  be  used  to 
support  extraction  goals.  In  this  paper,  we  use  an  approach 
that  is  based  on  the  notion  of  Topic  Models.  Blei  et  al  [2] 
first  used  this  approach  to  represent  a  document,  as  a  mixture 
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of  topic  distributions,  a  topic  being  a  statistical  distribution 
over  the  words  belonging  to  the  vocabulary  of  a  considered 
corpus.  A  number  of  variants  of  this  approach  have  been 
proposed  in  the  literature;  for  example  [6j.  In  the  current 
paper,  we  adopt  the  notion  of  Topic  Models,  but  we  extend 
the  notion  to  include  a  new  graphical  model  component,  and 
we  also  give  a  specific  meaning  to  the  distributions  used  in 
our  models  (this  is  something  that  is  rarely  discussed  in  the 
context  of  Topic  Model  research) . 

The  current  paper  is  organized  as  follows:  in  Sections  2 
and  3  we  provide  an  overview  of  our  approach  to  Web-based 
knowledge  extraction  in  support  of  cultural  modeling  and 
analysis;  in  Sections  4,  5,  and  6  we  present  the  methodol¬ 
ogy  used  to  represent  the  analytic  substrate  of  the  informs 
tion  extraction  process  for  the  Web-based  textual  sources; 
in  Section  7  we  describe  the  technical  approach  used  to  an¬ 
alyze  the  text  sources;  and  in  Section  8  we  present  a  specific 
example  of  our  approach  focused  on  the  domain  of  religious 
extremism. 

2.  CULTURAL  ANALYSIS  AND 
COGNITIVE  FEATURES 

We  adopt  an  epidemiological  approach  to  culture,  which 
sees  inter-individual  similarities  in  cognition  as  the  basis  for 
cultural  groupings  [8].  A  fundamental  assumption  of  this 
perspective  is  that  shared  developmental  experiences  lead 
to  important  similarities  in  the  mental  representations  (e.g. 
concepts,  beliefs  and  values)  that  are  distributed  among 
members  of  a  population.  The  Web  appears  as  the  right 
place  to  study  how  ideas  are  spread  among  behavioral  norms, 
discussions,  interpretations,  and  affective  reactions  within 
specific  populations.  We  are  interested  in  models  that  are 
able  to  elicit,  analyse,  and  represent  the  beliefs,  values,  and 
cognitive  concepts  that  are  shared  by  members  of  a  cultural 
group  and  how  these  affect  their  decisions  or  how  they  are 
connected.  First,  let  us  give  an  informal  definition  of  what 
is,  for  us,  a  cultural  group: 

DEFINITION  1.  A  Cultural  Group  is  a  collection  of  people 
who  are  grouped  together  by  virtue  of  their  similarity  along 
specific  cognitive  dimensions;  e.g.  commonality  of  beliefs , 
attitudes  and  values. 

Now  let  us  describe  how  we  model  the  relationship  among 
a  cultural  group  and  the  cognitive  signatures  of  its  individ¬ 
uals  in  our  perspective.  We  start  from  the  modelling  ap¬ 
proach  that  was  developed  in  (8j.  The  approach  is  called 
Cultural  Network  Analysis  (CNA).  In  CNA,  a  conceptual 
model  based  on  belief  network  is  used  to  show  the  cultural 
knowledge  within  a  population.  We  model  the  Cognitive 
Features  as  follow: 

Definition  2.  A  Cognitive  Feature  (CF)  is  one  of  the 
following  structures:  1)  a  triple  {B,v,6),  with  B  being  a  be¬ 
lief,  v  one  of  the  values  of  B  and  €  Rn[— l,  +  l]  a  measure 
of  the  value  or  belief  perception  in  a  group  or  individual: 
negative  (—1  <  S  <  0),  positive  (0  <  6  <  X)  or  neutral 
(6  —  0);  ii )  a  triple  (C,  E,p },  with  C  and  E  being  cause  and 
effect,  respectively,  of  the  casual  relationship  C  E  and 
p  e  {+,  — }  is  a  negative  (— )  or  positive  (~i -)  polarity. 

According  to  our  epidemiological  approach  we  simply  de¬ 
fine  our  model  as  follows: 


DEFINITION  3.  A  Cultural  Model  (J4 )  is  a  set  of  Cogni¬ 
tive  Features 

For  example,  to  understand  what  is  meant  by  the  terms 
“belief’  and  “value”,  let  us  consider  the  religious  domain. 
In  this  domain,  we  introduce  some  beliefs  knows  as  meta- 
cognitive  beliefs.  The  terms  meta- cognition  refers  to  the  be¬ 
liefs  about  how  one  thinks  and  learns  [7j.  In  particular, 
these  beliefs  are  the  ones  that  affect  the  cognitive  processes 
that  govern  feelings  of  confidence  in  world-views.  In  Table 
1  are  reported  the  meta-cognitive  beliefs  that  we  introduce 
together  with  their  values.  We  reported  just  an  example 
of  the  meaning  of  these  beliefs  such  as  the  one  related  to 
the  belief  Knowledge.  Knowledge  belief  has  two  values:  i) 
Maintenance  that  represents  ideas  that  emphasize  the  pri¬ 
ority  and  continuance  of  long-established  conceptions  of  the 
world  used  to  block  new  information,  interpretations  ;  ii) 
Change  that  represents  a  belief  that  emphasises  knowledge 
acquisition  and  change  at  the  individual  and  cultural  levels 
and  it  implies  that  existing  beliefs  may  be  wrong  and  in¬ 
complete,  or  no  longer  fit  with  current  situations.  Further 
explanation  of  those  metacognitive  beliefs  can  be  found  in 
{7j.  Examples  of  causal  relationship  in  a  cultural  environ¬ 
ment  can  be  the  triple  ( Religion ,  Innovation ,  -).  This  means 
that  we  can  have  a  decrease  in  the  Innovation  proportional 
to  a  rise  of  a  Religion.  For  example,  if  we  process  the  infor¬ 


Meta-Cognitive  Beliefs 

Knowledge 

Coherence 

Values 

Maintenance  (KM) 

Change  (KC) 

Homogeny  (CH) 

Diversity  (CD) 

Information  Exchange 

Judgement 

Values 

Separation  (IES) 

Interaction  (IEI) 

Authority  (JA). 

Independence  (JI) 

Table  1:  The  meta-cognitive  beliefs  and  their  val¬ 
ues  used  to  categorized  the  cultural  signals  in  the 
religious  domain. 

mation  coming  from  two  kind  of  cultural  groups  character¬ 
ized  by  an  extremist  ( Ge )  or  moderate  (Gm)  vision  about 
the  meaning  of  the  religion  in  the  world,  we  can  imagine 
to  obtain  the  following  cultural  models  A4e= {{Knowledge, 
maintenance,  0.9), {  Judgement,  authority,  1),  {Religion,  In¬ 
novation,  -},{ War,  Honour,  4-)}  for  Ge- 
Mm— {{Knowledge,  change, 1) , {Coherence,  Diversity ,  0.6), 
{Thinking,  Freedom,  {), {Democracy,  Religion,  ■+•)}  for  Gm- 
s,  without  taking  into  account  the  idea  of  their  positive, 
negative  or  neutral  attitude. 

3.  COGNITIVE  FEATURE  DETECTION 

One  way  in  which  Cognitive  Features  are  manifested  on 
the  Web  is  in  response  to  the  occurrence  of  particular  events, 
for  example,  military  interventions,  terrorist  attacks,  public 
protests  and  so  on.  These  events  elicit  responses  that  re¬ 
veal  something  about  the  beliefs,  attitudes  and  values  of 
the  respondents  to  the  event  in  question,  and  they  therefore 
reveal  something  about  respondent  cognitions.  Given  the 
aforementioned  cognitive  characterization  of  culture,  we  can 
see  that  individuals’  responses  to  particular  events  can  be  a 
valuable  source  of  cultural  information.  This  is  one  reason 
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why  the  Web  serves  as  a  souree  of  culture  information.  The 
advent  of  Web  2.0  has  supported  greater  participatory  inter¬ 
action  with  the  Web,  and  enabled  individuals  to  contribute 
to  Web  content.  If  information  extraction  technologies  can 
be  used  to  extract  information  about  individual  cognitions 
from  the  kind  of  resources  in  which  individuals  typically  ex¬ 
press  their  views  (for  example,  blogs,  twitter  feeds,  discus¬ 
sion  forums,  and  so  oil),  then  we  may  be  able  to  detect  some 
of  the  features  that  are  important  for  cultural  analysis  and 
modeling.  In  order  to  support  this  detection  process,  we  are 
interested  to  process  sources  that  deliver  signals  that  we  de¬ 
fine  as  cognitive.  Let  us  first  introduce  informally  what  we 
mean  for  cognitive  signals  as  follows: 

DEFINITION  4.  The  Cognitive  Signals  are  all  the  messages 
where  the  people  elicit  their  thinking  referable  to  a  cultural 
knowledge  unthtn  a  population.  These  messages  have  to  be 
automatically  processed  and  can  be  exchanged  using  different 
media. 

Examples  of  sources,  that  convey  Cognitive  Signals  and 
how  these  can  be  used  to  detect  the  relationships  between 
the  individuals  and  a  cultural  group,  are  presented  in  Figure 
1 .  For  example,  an  image  can  bo  an  important  indicator  of 
the  relationships  between  Web  page  authors  and  the  cultural 
groups  to  which  they  belong.  In  particular,  a  more  com¬ 
plex  analysis  is  required  based,  for  example,  on  how  much 
these  signs  are  used  among  linked  members  or  for  example 
in  which  position  they  are  depicted.  Intuitively,  an  image 
on  the  title  banner  is  more  valuable  than  others.  Exam¬ 
ple  of  images  are:  i)  logos  related  to  political  organizations 
(Figures:  l.a,  l.b):  ii)  flags  (Figures  l.c,  or  of  a  l.d):  iii) 
symbols  of  terrorist  groups  (Figures  l.a, l.b);  vi)  images  of 
historical  characters  (Figure  l.h).  Another  example,  in  this 
case  in  video  format,  is  represented  in  Figure  l.g.  It  is  the 
famous  speech  about  freedom  in  the  Bravehearl  movie.  A 
high  degree  of  content  sharing  among  a  community  provides 
an  indication  of  how  important  uolions  such  as  freedom  are 
among  a  group  and  how  they  think  about  freedom.  Most 
of  these  signals  can  not  be  processed  separately,  so  a  multi¬ 
modal  analysis  using  dill'erent  media  is  required.  Processing 
different  signals  over  different  media  channels  can  provide 
important  inputs  to  cultural  modeling  and  analysis.  In  spite 
of  the  importance  of  multi-modal  analysis,  much  of  the  in¬ 
put  for  Cognitive  Feature  detection  will  probably  come  from 
text-based  sources.  In  this  paper,  we  focus  our  attention  on 
the  Cognitive  Features  extracted  from  signals  related  to  text 
sources.  Examples  of  text  sources  that  are  relevant  from  the 
cultural  point,  of  view  are  depicted  in  Figure  l.i  and  Figure 
1.1.  These  sentences  reveal  the  views  of  content  authors 
that  reflect,  their  membership  of  particular  cultural  groups 
(for  example,  moderate  or  extremist  religious  groups).  In 
this  setting,  the  Cognitive  Features  detection  process  aims 
to  model,  extract,  and  process  those  Cognitive  Signals  in 
order  to  detect,  the  Cognitive  Features  and  eventually  struc¬ 
ture  all  the  results  of  this  process  in  what  we  call  Cognitive 
Patterns.  Let  us  give  a  formal  definition  of  this  object. 

Df.fiNITION  5.  Let  us  consider  a  Cultural  Model  At  and 
one  of  its  Cognitive  Feature  r.  A  Cognitive  Pattern  (Tm) 
associated  to  t  belonging  to  At  is  a  set  of  triples  as  (r,  r,  /*), 
where  r  a  source  containing  a  Cognitive  Signals  referable  to 
an  individual  or  group  within  a  population  and  p  €  [0,  ljHR 
a  measure  of  how  r  is  reliable  to  be  a  representative  of  r  on 
the  consideretl  individual,  group  or  population. 


Eitty  innovation  M  misguidance  and  even'  misguidance  is 
. 


Figure  1:  Examples  of  Cognitive  Signals  in  different 
media  formats. 


Wc  note  that  in  this  setting  the  resource  r  can  be  any  data 
belonging  to  a  group  in  any  format  that  an  expert  identifies 
as  a  valuable  source  of  cultural  information.  Then,  this  de¬ 
tection  process  has  the  aim  to  populate  a  “Cultural  Pattern 
Database”  (Vdb)  where  all  this  knowledge  is  stored  and  up¬ 
dated  by  domain  experts.  Now,  let  us  describe  how  we  deal 
with  the  Cognitive  Signals  related  to  the  text  sources. 

4.  THE  TEXT  WEB  SOURCES 

In  this  section  we  explain  how  we  model  and  extract  sig¬ 
nals  from  the  text  related  to  a  web  page.  First  wc  give  a 
more  formal  definition  of  how  we  model  the  text,  messages 
and  then  how  we  extract,  our  model  from  a  text  document. 

4.1  Text  Signal  Modeling 

We  model  the  text  using  a  linguistic  model  known  as  the 
N-gram  approach  [5|.  In  this  model,  the  text  is  divided 
into  structures,  known  as  gram  elements ,  which  are  formed 
by  tokens  extracted  from  the  text..  Let.  us  consider  a  text 
fragment  and  assume  that  wc  extract  from  it  some  gram 
elements,  looking  at  the  words  as  linguistic  tokens.  Firstly, 
we  use  the  term  gram  elements  types  to  indicate  the  type  of 
ngram  extracted.  A  grain  element  type  can  be  associated  to 
one  of  the  following  category :  uni-gram,  bi-gram  or  tri-gram. 
Let  us  introduce  the  definition  of  Text  Signal  as  follows: 

Definition  6.  A  Text  Signal  is  a  set  of  gram  elements 
belonging  to  the  same  category.  In  jyartiridar,  we  use  the 
following  symbols:  i)  T  for  the  Text  Signal  made  by  uni- 
grum;  ii)  "T1  for  the  Text  Signal  made  by  bi-gram;  iii)  Ts 
for  the  Text  Signal  made  by  tri-gram. 

Now,  looking  at  the  I ‘art  of  Speech  (PoS)  tag  [5]  asso¬ 
ciated  with  each  word  belonging  to  a  Text  Signal,  we  can 
differentiate  them  as  follows: 

DEFINITION  7.  Let  us  consider  the  following  elements  (w,) 
€  T,  (wi,Wj)  €  T2,  (vJi,  Wk,  Wj)  €  T'\  and  let  us  attach  to 
all  of  them  their  Pari  of  Speech  (PoS)  labels  as  follows:  : 

(wi/lif  Wjfi2)  and  (wiflf .w^/l^.Wj/l^).  We  can 
differentiate  these  Text  Signals  using  the  computed  PoS  la¬ 
bels  as  follows: 

•  A  Text  Entity  Signal  is  a  subset  of  those  elements  be¬ 
longing  to  T  or  7~"  or  Ts  that  fulfil  the  following  con¬ 
ditions:  i)  for  T.  I}  ts  a  noun  or  pwper  noun:  ii)  for 
r2,  we  have  that  both  l*  ami  I*  are  nouns  or ■  pwper 
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nouns;  in)  forT3.  wc  have  that  both  l3 ,  l 3  are  nouns 
or  proper  nouns  and  / /  is  a  verb.  IVe  use  the  following 
symbols  £  C  'T.  £~  C  T* ,  £3  C  T3  to  rcfe,r  to  those 
subsets. 

•  A  Text  Sentiment  Signal  is  a  subset  of  those  elements 
belonging  to  T~  or  X3  that  fulfil  the  following  condi¬ 
tions:  i)  for  T2,  we  have,  that  I2  is  a  noun  or  proper 
noun  and  l 2  is  an  adjective;  it)  for  X3,  wc  have  or  if 
is  a  noun  or  proper  noun,  if.  is  o  verb  and  l  j  is  an 
adjective ,  or  if  is  a  noun  or  proper  noun,  if  is  an  ad¬ 
jective.  and  If  is  adverb,  or  i  f  is  a  noun  or  proper  noun 
and  both  l  j  and  i  f.  are  adjectives.  We  use  the  following 
symbols  S  C  T,  S2  C  X" ,  Ss  C  T3  to  refer  to  those 
subsets. 

For  example,  using  the  sentence  in  Figure  l.i  a  Text  Entity 
Signal  ran  he  { religion,  history,  ( religion ,  interpretation), 
( religion ,  root,  history))  and  a  Text  Sentiment  Signal  can 
be  {( religion ,  dynamic ),  ( religion,  become,  dynamic.)}. 

4.2  Text  Signal  Extraction 

We  can  now  describe  the  process  to  extract  Text  Signals 
from  an  unstructured  text  document  d.  We  first  pre-process 
d  by  sending  it  to  a  standard  Natural  Language  Process¬ 
ing  (NLP)  pipeline  made  up  of  the  following  components: 
Sentence  Tokenizer,  Word  Tokenizer,  Part  of  Speech  tagger, 
Stop  Word  Eliminator,  etc.  (more  details  about  these  NLP 
steps  can  be  found  in  [5]).  l.  After  these  phases,  we  repre¬ 
sent  each  of  the  sentences  of  d  as  a  vector  of  words  with  a 
related  vector  corresponding  to  the  PoS  annotation  of  each 
word.  For  example,  for  the  sentence  in  d,  we  have  a  vec¬ 
tor  Ir  i  made  by  the  words  plus  an  associated  vector  ~x  n  of 
the  same  cardinality  of  such  that  ~x  u\k]  U  is  the  PoS 
label  of  the  word  “r  i\k\.  The  elements  of  ~r  j  are  the 
words  filtered  in  the  previous  NLP  pipeline.  Then,  we  can 
derive,  for  each  vector,  a.  Text  Signal.  In  the  ease  of  T,  we 
have  just  the  elements  of  a  vector  IT,  instead  for  X2  and  X3 
we  reduce  the  possible  number  of  binary  and  ternary  com¬ 
binations  of  elements  belonging  to  a  vector,  by  choosing  a 
maximum  linguistic  dependency  that  have  to  be  considered 
among  its  words.  In  particular  for  T2  and  X3  the  strategy 
used  to  extract  bi-grams  and  tri-grams  from  a  vector  is  de¬ 
picted  in  Figure  2.  In  particular,  we  choose  a  dependency 
window  w  and  then  from  this  value  we  can  compute  the  in¬ 
dexes  used  to  extract  the  bi-grams  and  tri-grams.  In  Figure 
2,  the  indexes  for  a  generic  step  i  of  our  extraction  process 
are  depicted  both  for  bi-grams  and  tri-grams.  Note  that  in 
Figure  2  we  choose  the  same  dependency  window  w  for  both 
bi-grams  and  trigrams.  In  general  ,  if  we  have  a  vector  of 
cardinality  N,  we  extract:  i)  (N  -  Wb)tv*  +  Wi,(”J,~l>  <  (?) 
number  of  bi-grams  if  N  >  wt  being  uv>  the  dependency  win¬ 
dow  for  a  bi-gram,  otherwise  all  the  different  combinations; 
ii)  (.'V  -  2 wt)wf  |  I)  (?)  number  of  tri-grams  if 

N  >  2 wt  being  wt  the  dependency  window  for  a  tri-gram, 
otherwise  all  the  different  combinations.  After  computing 
X,  X2,  X3  using  a  vector  ~x ,  we  can  apply  a  filter  based 
on  the  information  computed  in  ~x  /,,,  in  order  to  obtain  the 

If  there  is  a  “not”  that,  comes  before  an  adjective,  we  col¬ 
lapse  the  negation  with  the  adjective  to  create  unique  words, 
for  example  “not.  good”  becomes  “not  good”.  This  is  impor¬ 
tant  in  order  to  deal  with  t  lie  negat  ion  of  an  adjective,  which 
can  completely  change  the  meaning  of  the  adjective  itself 
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Figure  2:  Approach  to  the  extraction  of  bi-gram  and 
tri-gram  from  a  vector  of  words. 


Text  Entity  Signal  and  Text  Sentiment  Signal  as  described 
in  Definition  7. 

5.  COGNITIVE  ANNOTATION 

'Pile  problem  now  is  to  understand  how  we  can  use  the  pre¬ 
vious  text  signals  in  order  to  delec  t  our  Cognitive  Features. 
Wc  use  a  supervised  approach,  where  the  cultural  analysts 
give  an  initial  subset  or  annotated  resources  that  are  used  by 
the  methods  described  in  the  next  sections.  We  call  this  ini¬ 
tial  set  the  Cognitive  Annotations.  For  example  a  cultural 
expert,  can  initially  select  from  the  Web  a  text  fragment, 
such  as  the  one  belonging  to  a  blog,  because  this  is  valuable 
to  describe  the  Cognitive  Features  r,  within  the  Cultural 
Model  Mi.  This  annotation  can  be  initially  given  by  the 
analyst  in  a  way  similar  to  how  we  describe  the  Cognitive 
Patterns.  For  example  using  the  Cultural  Model  defined  pre¬ 
viously  Me  and  M.w,  their  annotation  can  be  structured  as 
follows:  TMb  {(“religion  is  distorted ”,  (Knowledge,  main¬ 
tenance,  0.3),  1)};  Vmm~  {{“ knowledge  is  the  first  step  in 
firm  belief  and  conviction" ,  (Knowledge,  change,  0.8),  0.0)}. 
We  consider  a  real  scenario  in  which  the  resources  are  un¬ 
structured  texts  and  the  c  ultural  analysts  can  be  different  so 
we  need  to  define  how  we  process  these  annotations  in  order 
to  build  particular  valuable  patterns  that  are  related  with 
these  annotations.  Let  us  describe  how  we  process  a  text  an¬ 
notation  tfk  that  a  cultural  analyst  made  for  the  Cognitive 
Features  t}  belonging  to  a  Cultural  Model  Mi  and  how  we 
deline  these  Cognitive  Annotations.  In  particular  we  sup¬ 
pose  that  each  text-based  resource  used  in  tiie  annotation 
is  a  sentence.  We  process  this  unstructured  knowledge  to 
extract  the  Entity  and  Sentiment  Text  Signals  in  the  same 
way  described  in  Section  4  for  the  Text  Signals,  using  the 
NLP  pipe,  the  bi-grams/ tri-gram  extraction  process  and  the 
PoS  filter.  After  this  process,  we  have  the  following  sets  X, 
X2  and  X3  or  £,  £ 2,  £ 3,  S 2  and  S3.  Now  we  build  a  Cogni¬ 
tive  Annotation  as  a  special  Cognitive  Pattern  PR  ,  where  a 
triple  is  (A* tTj,Pi).  Being  a  set.  of  gram  elements  com¬ 
puted  over  the  initial  resources  tfk  and  //*  a  new  reliable 
measure.  In  particular  we  divide  these  annotations  into  i) 
Simple  Text  Cognitive  Annotation  if  Af  is  one  of  the  follow¬ 
ing  sets:  X,  X2,  X3  ;  i)  Entity  Text  Cognitive  Annotation  if 
Ai  is  one  of  the  following  sets:  £,  £2 ,  £ 3;  iii)  Sentiment  Text 
Cognitive  Annotation  if  is  one  of  the  following  sets:  S2, 
S3.  Let  us  now  explain  how  we  compute  the  new  reliable 
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measure  /  A  In  order  to  compute  this  new  measure  we  use 
the  previous  annotations.  The  value  j>i  in  vC . ",  ■  l‘i)  can 
be  computed  as  follows: 

Hi  =  ai{avg{NMlAi.))  +  a2(nk)  (1) 

In  this  equation,  we  use  a  convex  combination,  c*i  +<*2  =  1, 
of  the  average  (avg)  value  of  the  Normalized  Mutual  In¬ 
formation  [5j  computed  for  each  gram  element  in  Af,  i.e 
NMlAk.  With  (.ik  we  mean  the  initial  value  associated  by 
a  domain  expert,  to  annotate  the  resource  tf We  compute 
the  Normalized  Mutual  Information  as  follows:  we  indicate 
with  g  a  gram  element  belonging  to  Af  and  with  tj  the 
related  Cognitive  Features.  We  define  for  g  and  Tj  two  bi¬ 
nary  random  variables  Xg  and  YT)  respectively  and  then  we 
compute  an  associated  contingency  table,  such  as  the  one  de¬ 
picted  in  Figure  3.  In  this  table,  we  represent  the  frequencies 
related  to  how  much  a  gram  g  is  used  to  describe  Tj  or  not.2. 
In  particular,  the  sub-references  of  0  and  1  used  in  the  table 
in  Figure  3  are  used  to  indicate  the  absence  or  presence  of 
our  variables.  For  example,  if  we  would  measure  the  events 

^  Cultural  Clement  <  rj) 

Hi 


Figure  3:  Contingency  Table  used  to  compute  the 
Normalized  Mutual  Information. 
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in  which  the  considered  gram  element  is  used  to  describe 
Tj,  the  joint  probability  is  P( X9  =  1  ,  YV,  =  be- 

ing  Nn,  the  number  of  times  (lie  grain  element  g  is  used  in 
the  resources  associated  with  the  Cognitive  Features  r,  and 
Ntot  the  total  number  of  gram  elements  of  the  same  gram 
element  type  of  g  stored  in  all  (lie  annotations.  Then,  we 
compute  the  Normalized  Mutual  Information  for  g  and  Tj 
as  follows: 


NMI(X„Yri) 


MHX,.YV) 
min(U(Xa),H(Yr,)) 

mi(X,j,yTj)  -  £  Y.  Pl2'°9<7 nb> 

re{0,l)(f{0.1> 


(2) 


where  .//(*)  is  the  entropy  of  a  random  variable  {*},  Pi 2  — 
P{Xg  t,Yt Pi  -  P(Xt=r)  and  /’>  -  P(Krj=i)  and 
Ml  (lie  Mutual  Information.  We  note  that  this  procedure 
is  useful  to  understand  what  are  the  best,  annotations  that 
can  be  used  within  our  process.  This  procedure  can  also 
be  triggered  every  time  wc  have  a  new  annotation  in  order 
to  use  the  best  knowledge  collected  by  the  domain  expert. 
We  rail  “Cognitive  Annotation  Database”  ( CAdb )  the  place 
where  all  this  knowledge  is  stored  and  updated  by  domain 
exports. 


6.  THE  GRAM  ELEMENTS  DISTANCES 

hi  this  section,  we  explain  how  we  compare  the  gram  ele¬ 
ments,  such  as  the  ones  belonging  to  the  sets  introduced  in 

2 We  confider  each  belief  against  the  rest,  as  in  a  leave-one- 

out  approach. 


the  above  sections.  Computing  semantic  distances  among 
the  words  in  the  extracted  gram  elements  can  require  a  lot 
of  time  due  the  complexity  of  the  measures  related  to  the 
navigation  of  the  knowledge  used  to  support  this  compu¬ 
tation.  In  order  to  optimize  this  step,  we  used  a  hybrid 
approach  based  on  a  linguistic  and  semantic  distance.  This 
approach  has  the  aim  to  choose  a  different  distance  compu¬ 
tation  according  to  the  PoS  labels  associated  with  each  word 
of  our  gram  elements.  We  define  the  following  strategies: 

•  Strategy  1  (SI)  based  on  the  Edit  similarity  that  mea¬ 
sures  how  many  linguist  operations  we  need  to  use  in 
order  to  transform  one  word  into  another  one. 

•  Strategy  2  (S2)  based  on  the  Jaccard  similarity  that 
measures  how  many  elements  two  sets  have  in  com¬ 
mon.  In  particular,  for  each  word  we  build  a  set  with 
the  synsets  retrieved  from  the  WordNet  [4]  database 
that  are  connected  at  maximum  distance  of  2  edges  of 
the  WordNet  graph.  In  particular  we  consider  only  the 
graph  made  by  hypernym  and  hyponym  relations  for 
the  nouns  and  only  by  the  hypernym  relations  for  (he 
verbs.  Then,  in  order  to  compute  the  distance  between 
two  input  words  we  just  use  the  Jaccard  Index  on  the 
obtained  sets. 

•  Strategy  3  (S3)  based  on  the  average  polarity  similar¬ 
ity,  that  takes  into  account  if  two  adjectives  belong 
to  the  same  polarity  region,  i.e  positive,  negative  or 
objective.  To  compute  this  measure,  we  use  the  Senti- 
Word.Net  resource  [3].  In  more  detail  using  the  kuowl 
edge  of  our  resource,  we  divide  the  polarity  region  in 
tree  equal  subspaces:  positive,  negative  and  objective. 
Given  an  adjective,  we  retrieve  all  its  synsets  from 
the  SentiWordNet  resource.  Then,  we  classify  each  re¬ 
trieved  synset  with  a  local  polarity  indicator  based  on 
the  thresholds  used  in  the  subspaces  definition.  Then, 
we  define  a  global  polarity  indicator  for  this  adjective 
as  the  most  common  local  polarity  indicator  among  all 
of  its  synsets  and  we  also  associate  to  it  a  global  / Hilar¬ 
ity  measure  computed  as  the  average  values  among  the 
ones  that  belong  to  the  same  space  of  the  global  po¬ 
larity  indicator.  Now,  we  compute  the  average  polar¬ 
ity  similarity  between  two  adjectives  as  the  minimum 
global  polarity  measure  between  the  two  words  if  both 
the  adjective  has  the  same  global  polarity  indicator 
otherwise  it  is  0. 

In  Table  2,  we  depicted  the  strategy  used  to  compute  the 
distances  among  words  according  to  their  PoS  label.  We  use 
the  enumeration  introduced  in  this  section  to  represent  (lie 
selected  approach,  0  means  that  we  choose  to  not  compute 
any  distances.  We  note  also  ( hat  in  ( he  case  of  adjective  ( hat 
are  collapsed  with  their  negation  we  apply  the  strategy  3  if 
also  the  other  adjective  was  in  (lie  same  situation.  Now,  for 
example  let  us  consider  two  gram  elements  gi,g2  €£3.  The 
similarity  siin(g\,gi)  is  computed  following  (.lie  strategies 
defined  in  Table  2  for  each  couple  of  words  obtained  from 
words  in  g\  and  in  <72,  then  the  average  value  is  returned. 
In  particular,  wo  note  that  just  for  gram  elements  belonging 
to  £~  and  £s  wo  consider  all  the  possible  couples.  Wc  note 
that  also  this  approach  can  take  advantage  of  some  caching 
operation  on  all  the  resources  involved. 
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Noun 

V  Noun 

Adjective 

Verb 

Adverb 

Noun 

di 

ii 

SI 

0 

Adjoctivw 

0 

0 

S3 

0 

0 

V*rb 

0 

0 

0 

S2 

0 

Adverb 

0 

o 

0 

0 

SI 

Table  2:  Strategies  used  to  compute  the  distances 
among  words  based  on  PoS  tagging.  The  enumera¬ 
tion  is  the  one  described  in  Section  6.  P.Noun  means 
Proper  Noun 


7.  MINING  THE  COGNITIVE  FEATURES 

In  this  section  we  describe  how  we  process  a  set  of  re¬ 
sources  in  order  to  detect  the  introduced  Cultural  Models. 
Let  us  suppose  that  we  have  a  network  of  resources  such  as 
web  pages.  Each  resource  can  be  automatically  processed  in 
order  to  extract  useful  information  such  as  images,  tables, 
text  content,  links,  htnil  structures.  Let  its  explain  how 
we  process  the  text  data.  We  design  a  process  flow,  which 
is  depicted  in  Figure  4,  that  is  based  on  four  main  mod¬ 
ules:  Resource  Knowledge  Processing ,  Cultural  Knowledge 
Processing ,  Context  Selection  and  Cultural  Model  Detection. 
The  Resource  Knowledge  Processing  module  has  the  aim 
to  extract  all  the  Text  Signals  from  an  input  resource,  as 
described  in  Section  4.  For  example,  it  takes  as  input  re¬ 
source  a  document  d  and  it.  returns  a  structure  made  by 
a  sentence  .<*,•  belonging  to  d  and  some  useful  Text  Signals 
extracted  from  a,-  such  as  (a*,  Smi .  £jt ,  ,  Sft).  The 

Cultural  Knowledge  Processing  module  has  the  aim  to  de¬ 
rive  the  Cognitive  Annotations  related  to  a  selected  Cultural 
Model.  It  takes  in  input  a  Cultural  Model  A4={ei,. . .  ,e„} 
and  for  each  e,  €  M  it  retrieves  a  Cognitive  Annotation.  It 
chooses  for  each  e *  the  most  important  Cognitive  Annota¬ 
tion  using  the  reliable  measures  computed  in  the  processing 
described  above.  If  returns  for  each  ti  €  .M  a  structure  such 
as  (c\,  £Ci ,  S;(,  «S,4)  where  the  sets  are  the  union 

of  the  sets  of  gram  elements  of  the  same  gram  element  type 
belonging  to  the  same  Cognitive  Annotation.  For  example 
£*4  is  the  union  of  all  the  tri-grams  that  belong  to  the  En¬ 
tity  Text  Cognitive  Annotation  selected  to  be  representative 
for  The  Context  Selection  Module  has  the  aim  to  choose 
some  group  or  sentences  that  are  indicative  of  our  further 
analysis.  The  Cultural  Model  Detection,  instead,  has  the 
aim  to  evaluate  the  presence  of  each  Cognitive  Feature  in 
the  initial  resource.  In  this  way  we  are  able  to  understand 
if  the  considered  Text  Signals  have  the  cognitive  signatures 
related  to  the  .selected  Cultural  Model.  Let  ils  give  more  de¬ 
tails  about  the  last  two  modules  in  the  following  subsect  ions. 


7.1  Context  Selection  Module 

In  this  module,  wo  start  to  consider  a  different  granularity 
for  our  analysis,  in  particular  we  define  the  Context  as  set 
of  subsequent  sentences.  At  this  stage  the  initial  resource, 
for  example  a  document,  can  be  seen  as  C  {ci,...,cm}, 
being  its  generic  element  a— {{«<  £t%  u>  £»,  k , 

Si.  ....  <«+*,  «*♦*.  Sl,k. 

a  Context  of  2 k  I  1  subsequent  sentences,  with  k<i  together 
with  all  the  text  signals  extracted  for  each  sentence  .  In  this 
module  we  define  a  filter  able  to  select  only  the  Context  that 
we  need  to  process  by  a  next  module.  The  filter  is  designed 
as  a  statistical  decision  process  based  on  the  analysis  of  the 
uni-gram  of  each  Context.  In  particular  we  model  the  rele¬ 
vance  of  the  C  ontext  hi  terms  of  trails  of  a  binary  random 
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Figure  4:  The  modules  used  in  our  process. 


variable  (r.v).  In  fact,  we  map  each  uni-gram  belonging  to 
a  context  c,  to  an  independent  and  identically  distributed 
(i.i.d)  binary  r.v  and  we  define  a  kind  of  relevance  of  the 
input  Context  through  a  Bernoulli  process.  Let  us  consider 
the  set  Sc,  made  by  the  union  of  the  different  £,+  being  s .  a 
sentence  belonging  to  c»  and  Nc,  its  cardinality.  We  deline 
a  grade  of  relevance  r  as  r  success  in  Nc,  trial  as  follows: 

P(r|/Vc,.0)  =  njfl'(l  -  0)<-v-  1  (3) 

For  our  decision  problem,  we  are  interested  in  the  Bayesian 
estimation  of  9.  This  operation  is  done  using  as  “observed 
trials”  the  iV,.,  words.  Let  as  now  explain  how  we  map  the 
gram  elements  in  a  set  of  binary  variables.  We  transform 
each  uni-gram  g  6  £Ci  in  a  sequence  of  relevance  (ro)  or  not 
relevance  (tiro)  observation  through  the  function  f,  defined 
as  follows: 

f  ro  if  g  €  £m 

fr(g .  M)  =  <  ro  if  3  g*  €  £m  :  dist(g,g *)  <  t  (4) 
I  m  o  otherwise. 


Being  Sm  the  union  of  all  the  uni-gram  attached  to  the 
Ci  €  M,  with  M  the  chosen  Cultural  Model,  diet,  a  distance 
between  two  uni-grams  computed  according  to  the  strategy 
defined  in  Section  G  and  c  a  fixed  threshold.  We  use  for  the 
estimation  of  0  as  prior  an  noil-informative  beta  distribution 
Bcta(9\a,(3)y  with  a  —  (3  —  0.5  .  According  to  the  Bayesian 
Analysis,  t  lie  estimator  0  has  a  distribution  Belu(ai+nro  ,{3+ 
nnro),  being  nro  and  nnro  Lite  number  of  times  we  observe 
a  relevance  or  a  not- relevance  sample  respectively.  Now  the 
selection  process  is  computed  as  follows:  i)  we  first  select 
those  contexts  for  which  the  base  condition  E\9\  >  0.5  is 
verified;  ii)  then  we  send  to  the  next  module  those  contexts 
whose  discriminative  measure  of  relevance  (dr)  exceeds  a 
given  thresholds.  The  dr  is  defined  as  follows: 


dr  = 


E\9\  -  0.5 


(5) 


being  o(9)  the  standard  deviation  of  9. 


7.2  Cultural  Model  Detection  Module 

In  this  modulo,  we  propose  a  model  able  to  evaluate  the 
diffusion  of  the  Cognitive  Features  on  the  input  resources  by 
analysing  the  extracted  Text  Signals  in  order  to  extract  cul¬ 
tural  evidence  from  them.  First,  we  traduce  all  the  selected 
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Contexts  in  the  previous  module  as  binary  strings  using  the 
Cognitive  Annotations  extracted  from  the  selected  Cultural 
Model.  This  process  is  made  by  a  function  /  defined  as 
follows: 

(1  if  n  A*Cj  /  0 

1  if  Bg‘}  e  A‘e> .  g’  6  :  dist(g‘ti  ,g‘Cj)<  C.t . 

0  otherwise. 

(6) 

Being  A*Cj  a  set  of  text  signal  extracted  from  the  sentences 
belonging  to  a  c3  that  was  selected  in  the  previous  module 
and  the  set  of  gram  belonging  to  the  selected  Cogni¬ 
tive  Annotation.  For  example  A*3  can  be  Af*  Sj,  U  . . . 

U  S^nc  ,  where  {«i,. ..  ,8„c.  }  are  the  sentences  belonging 
to  Cj.  We  note  that  in  this  phase  wo  only  considered  the 
bi-grams  and  tri-grains.  Then,  we  have  dint,  and  that 
are  distances  computed  as  describe  in  Section  <>  and  a  fixed 
threshold  related  to  the  text  element  type  and  Cognitive 
Feature,  respectively.  We  note  that  this  distance  requires 
that  the  input  grams  are  of  the  same  gram  element  type. 
This  means,  for  example,  that  we  compare  a  tri-gram  ex¬ 
tracted  from  an  Entity  Text  Cognitive  Annotaion  with  a 
tri-gram  coming  from  the  Text  Entity  Signal  of  a  Context. 
For  sake  of  clarity,  we  depicted  in  Figure  5  how  we  generate 
these  biliary  signals  from  a  generic  Context  a.  To  evaluate 
how  the  Cognitive  Features  are  spread  around  the  Contexts, 
we  considered  a  hierarchical  bayesian  model  that  can  be  seen 
as  tlie  generative  models  of  these  binary  strings. 

ei — 

Af i’=S»  u  . . .  u  SI" 

q-tt— r~n  t  -rrjrrrr-i 

^ -  •Af‘  — S*  U  . . .  U  cSj 

Figure  5:  How  we  build  the  binary  strings  from  the 
Context  Ct  using  the  approach  described  in  Section 
7.2. 


These  generative  models  are  well  studied  in  statistical  nat¬ 
ural  language  processing  to  inference  topic  distributions  on 
corpus.  Our  approach  has  some  similarity  with  the  graphi¬ 
cal  model  proposed  in  [6).  The  model  Is  depicted  in  Figure 
6  using  the  Plato  notation  [6].  In  particular  in  this  gener¬ 
ative  model,  as  described  in  Figure  6,  we  have  a  document 
d  described  by  a  set  of  Context  C,.  From  d  wc  sample  with 
uniform  distribution  a  Context  x.  Then  for  each  Context 
x,  we  sample  a  Cognitive  Feature  z  from  its  set  C.£  with 
a  multinomial  distribution  with  parameters  0.  Then,  from 
each  Cognitive  Feature  z  we  sample  a  binary  variable  w 
from  a  binomial  distribution  with  parameter  0  four  times. 
In  this  setting  we  have  a  learning  problem,  where  we  use  the 
Bayesian  theory.  This  means  that  wc  want  to  estimate  the 
distribution  of  latent  variables  using  some  data  and  prior 
over  these  latent  variables.  In  particular  wc  use  as  prior 
the  Dirichlet  distribution  a  for  the  multinomial  distribution 
and  the  Beta  distribution  (/?)  for  the  Binomial  distribution. 
As  data  to  observe  we  use  the  binary  strings  obtained  from 


Figure  6:  Hierarchical  Bayesian  Model  used  to  es¬ 
timate  the  distribution  of  Cognitive  Features  in  a 
document  d.  The  latent  variables  are  represented 
by  white  nodes. 


the  different  Contexts  coming  from  a  document  d,  that  w'ere 
selected  by  the  previous  module.  We  are  interested  to  esti¬ 
mate  the  distribution  G  that  gives  the  information  of  how 
the  Cultural  Features  are  spread  around  the  Context.  To 
estimate  this  distribution  wc  use  some  equations  used  in  the 
well  know'll  Collapsed  Gibbs  inference  algorithms  [6],  that  is 
typically  used  to  estimate  the  latent  variables  in  Bayesian 
graphical  model.  We  note  that  we  use  this  approach  to  mea¬ 
sure  a  distribution  of  r.vs  rather  than  to  classify  new  data 
by  training  a  bayesian  learner. 

8.  CASE  STUDY 

We  apply  our  framework  in  the  domain  of  the  Islam  reli¬ 
gion,  with  the  aim  to  understand  how  the  beliefs  introduced 
in  Table  1  are  spread  around  three  main  population:  Moder¬ 
ate  Arab,  Moderate  USA,  Extreme.  The  data  was  collected 
from  web  sites  in  Arabic  and  English  language,  in  partic¬ 
ular  they  are  collected  using  Google  search  engine  both  in 
English  and  in  Arabic  with  some  keywords  related  to  our  be¬ 
liefs.  In  particular,  we  collected  80  documents,  23  of  them 
were  used  to  select  some  Cognitive  Annotation  for  each  be¬ 
lief  defined  in  Table  1  and  the  others  to  run  our  experiments. 
The  domain  experts  report  for  each  belief  a  set  of  sentences 
that  are  used  as  Cognitive  Annotations.  We  note  that  the 
Cultural  Model  considered  in  our  case  study  is  made  only 
of  Cognitive  Features  such  as  {#.  —  .6)  or  in  other 

words,  we  consider  as  a  Cognitive  Feature  a  belief  with  its 
values  for  the  Context  Selection  module  and  the 

belief  without  its  values  for  the  Cultural  Mcxiel 

Detection.  We  do  not  take  into  account  an  initial  measure 
of  the  altitude  of  each  belief  so  we  start  our  process  with 
6  0.  We  compute  also  for  each  element  in  the  Cogni¬ 

tive  Annotation  a  reliable  measure,  which  is  a  value  in  the 
interval  of  |0, 1]  H  R  as  described  in  Section  5.  We  select 
as  Context  a  fixed  group  of  5  sentences,  and  we  choose  the 
following  thresholds  r  0.8  and  dr>0.8  for  the  Context  Se¬ 
lection  module.  For  the  Cultural  Model  Detection  module, 
we  use  0.8  for  each  .  We  note  also  that  for  the  document 
written  in  the  Arabic  language,  wc  first  run  some  machine 
translation  procedure  and  then  these  documents  were  cor¬ 
rected  by  a  native  arabic  speaker  in  order  to  overcome  the 
problem  related  to  the  imperfection  of  the  machine  transla¬ 
tion  algorithms.  In  Table  3  is  depicted  the  data  about  the 
dimensions  of  our  collection  and  the  average  numbers  of  se- 


Approved  for  Public  Release  (Distribution  is  Unlimited) 


B-8 


Cognitive  Solutions  Division 
Applied  Research  Associates,  Inc. 


Final  Report 

Prime  Contract  No.:  N00014-10-C-0078 


Num.  rtf 

DociiirionLn 

AVC  Num. 
of  Conloxln 

AVC  Num. 

•if  noloclod 

13 

Arab 

18 

<>23 

88 

Kxtroma 

IS 

739 

38 

Table  3:  Summary  of  the  collected  data 


AC-KM 

R-KM 

AC-KC 

R-KC 

Moderate  USA 

22 

0.4 

32 

0.4 

Moderate  Arab 

lfi 

0.7 

40 

O.C 

Extreme 

22 

0.8 

12 

0.6 

AC-CI 1 

R-OI 1 

AC-CI) 

R-CI) 

Moderate  USA 

9 

0.7 

34 

0.7 

Moderate  Arab 

14 

0.7 

48 

0.8 

Extreme 

m 

0.8 

12 

0.7 

ACMES 

R-IE.S 

AC-IEI 

R-1EI 

Moderate  USA 

IB 

0.7 

23 

0.5 

Moderate  Arab 

13 

0.7 

27 

0.4 

Extreme 

22 

0.6 

17 

0.4 

ACM  A 

R-JA 

AC-JI 

R-.II 

Moderate  USA 

19 

0.5 

33 

0.7 

Moderate  Arab 

22 

0.8 

42 

0.6 

Extreme 

25 

0.8 

15 

0.6 

Table  4:  Summary  of  the  results  in  our  case  study. 
The  full  names  of  *  in  AC/R-{*}  are  depicted  in 
Table  1. 


looted  contexts  after  running  the  Context  Selection  process. 
The  results  of  this  case  study  are  described  in  Tables  1  and 
in  Figure  7.  In  Table  4,  there  are  represented  the  average 
number  of  contexts  ( ACT-{ *} )  and  the  average  value  of  the 
reliable  measure  (R-{*})  that  is  assigned  to  each  Cognitive 
Feature.  Each  row  in  Table  4  is  related  to  a  member  of  the 
considered  population.  Eventually,  in  Figure  7,  we  have  the 
average  values  of  context  assigned  to  each  belief  in  log  scale. 
As  time  i>erfornianee.  we  note  that  for  an  average  number 
of  700  contexts,  8  Cognitive  Features,  and  2000  iteration 
of  Gibbs  sampler,  we  obtained  an  estimation  of  Cognitive 
Features  distribution  in  about  3  hours  of  wall-clock  time  on 
standard  3GHz  4GB  RAM  PC  workstation.  We  note  that 
Cognitive  Features  such  as  Coherence  Homogeiiy  and  Di¬ 
versity  are  good  indicators  of  the  cultural  differences  among 
our  population,  this  is  also  justified  by  the  better  annotation 
that  the  domain  experts  did  for  those  beliefs  as  suggested 
by  the  average  values  of  the  reliable  measures  associated  to 
them. 

9.  CONCLUSION 

In  this  paper  we  have  presented  a  general  framework  to 
analyse  cultural  behaviour  on  text  data.  In  particular  we 
propose  a  methodology  based  on  concepts  such  as  Cognitive 
Features,  Text  Signals  and  Cognitive  Annotations.  We  also 
proposed  some  computational  methods  in  order  to  use  our 
framework  with  some  text  data.  These  computational  meth¬ 
ods  come  from  the  area  of  Bayesian  Learning.  In  particular 
wc  designed  a  graph  model  used  to  estimate  the  diffusion  of 
cultural  beliefs  within  a  population.  A  Case  Study  in  the  ex¬ 
treme  religious  domain  was  also  reported  with  some  results. 
In  this  Case  Study,  we  can  see  how  the  Cognitive  Features 
can  be  used  to  discriminate  among  different  population  from 
a  cultural  perspective. 


?r.e  Ave rije  Eistnt-ticos  of  Cultural  Elene.its  (P.elijiocs  laiain) 


Figure  7:  How  the  Cognitive  Features  (CFs)  are 
spread  over  the  different  populations  in  this  case 
study.  In  particular,  wc  have  on  x-axis  the  CFs  and 
on  y-axis  the  log  of  average  number  of  Contexts. 
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Abstract.  The  World  Wide  Web  is  a  potentially  valuable  source  of  information 
about  the  cognitive  characteristics  of  cultural  groups.  However,  attempts  to  use 
the  Web  in  the  context  of  cultural  modeling  activities  are  hampered  by  the 
large-scale  nature  of  the  Web  and  the  current  dominance  of  natural  language 
formats.  In  this  paper,  we  outline  an  approach  to  support  the  exploitation  of  the 
Web  to  support  cultural  modeling.  The  approach  begins  with  the  development 
of  qualitative  cultural  models  (which  describe  the  beliefs,  concepts  and  values 
of  cultural  groups),  and  these  models  are  subsequently  used  to  develop  an 
ontology-based  information  extraction  capability.  Our  approach  represents  an 
attempt  to  combine  conventional  approaches  to  information  extraction  with 
epidemiological  perspectives  of  culture  and  network-based  approaches  to 
cultural  analysis.  The  approach  can  be  used,  we  suggest,  to  support  the 
development  of  models  providing  a  better  imderstanding  of  the  cognitive 
characteristics  of  particular  cultural  groups. 

Keywords:  cultural  network  analysis,  cultural  ontology,  cultural  model,  ontologv-based 

information  extraction,  culture,  cognition,  knowledge  extraction,  world  wide  web 


1  Introduction 

The  World  Wide  Web  (WWW)  is  a  valuable  source  of  culture-relevant  information, 
and  it  is  therefore  an  important  resource  for  those  interested  in  developing  cultural 
models.  The  exploitation  of  the  WWW  in  the  context  of  cultural  modeling  is, 
however,  hampered  both  by  the  large-scale  nature  of  the  Web  (which  makes  relevant 
information  difficult  to  locate)  and  the  current  dominance  of  natural  language  formats 
(which  complicates  the  use  of  automated  approaches  to  information  processing).  In 
this  paper,  we  describe  an  approach  to  support  the  use  of  the  Web  in  cultural 
modeling  activities.  The  approach  is  based  on  the  development  of  ontology -based 
information  extraction  capabilities,  and  it  combines  the  use  of  Semantic  Web 
technologies  and  natural  language  processing  (NLP)  techniques  with  an 
epidemiological  perspective  of  culture  [1J  and  the  use  of  belief  networks  to  analyze 
culture  [2],  Technological  support  for  tlie  approach  is  currently  being  developed  in  the 
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context  of  the  EXTREME1  project,  which  is  funded  by  the  U.S.  Office  of  Naval 
Research.  In  particular,  we  are  currently  developing  a  Web-based  knowledge 
extraction  system  that  incorporates  a  variety  of  information  extraction,  NLP  and 
Semantic  Web  technologies.  Such  a  system  may  be  seen  as  an  important  element  of 
an  iterative  approach  to  cultural  modeling:  one  in  which  an  initial  (qualitative) 
characterization  of  the  psycho-cognitive  characteristics  of  a  cultural  group  drives  the 
acquisition  of  information  that  subsequently  enables  cultural  analysts  to  refute, 
validate  and  extend  the  models  developed  at  previous  stages. 

Tire  structure  of  the  paper  is  as  follows.  In  Section  2  we  outline  what  is  meant  by 
the  term  ‘culture',  and  we  describe  an  approach  to  cultural  modeling  that  is  based  on 
the  development  of  models  representing  tire  ideas  associated  with  particular  cultural 
groups.  In  Section  3  we  describe  our  approach  to  the  development  of  Web-based 
knowledge  extraction  capabilities  to  support  cultural  model  development.  This 
approach  combines  conventional  approaches  to  information  extraction  with 
semantically-enriched  representations  of  cultural  models,  and  it  seeks  to  provide  a 
culture-oriented  ontology-based  information  extraction  (OBIE)  capability  for  tire 
WWW.  Finally,  Section  4  presents  some  conclusions  and  directions  for  future  work. 


2  Cultural  Models  and  Cultural  Network  Analysis 

Before  addressing  the  use  of  the  Web  to  study  culture,  we  first  need  to  define  what  is 
meant  by  tire  term  ‘culture’.  As  is  to  be  expected  in  any  highly  interdisciplinary  field, 
there  are  a  variety  of  conceptions  of  culture.  Our  conception  is  distinedy  cognitive  in 
nature,  and  it  is  based  on  an  epidemiological  perspective  [1],  A  fundamental 
assumption  of  this  perspective  is  that  shared  developmental  experiences  lead  to 
important  similarities  in  tire  mental  representations  (such  as  values  and  causal 
knowledge)  that  are  distributed  among  members  of  a  population.  Culturally 
widespread  ideas  ground  the  distribution  of  behavioral  norms,  discussions, 
interpretations,  and  affective  reactions  in  a  population,  and  researchers  working 
within  the  epidemiological  perspective  thus  seek  to  describe  and  explain  the 
prevalence  and  spread  of  ideas  within  populations. 

Working  from  this  perspective,  we  previously  developed  a  technique  called 
cultural  network  analysis  (CNA),  which  is  a  method  for  describing  the  ‘ideas'  that  are 
shared  by  members  of  cultural  groups  and  which  guide  the  decision-making  behavior 
of  group  members  [2],  CNA  discriminates  between  three  kinds  of  ideas,  namely, 
concepts,  values,  and  causal  beliefs.  The  cultural  models  resulting  from  CNA  use 
belief  network  diagrams  to  show  how  the  set  of  relevant  ideas  relate  to  one  another 
(see  Figure  1  for  an  example).  In  general,  we  can  distinguish  two  types  of  cultural 
models:  qualitative  and  quantitative  cultural  models  [see  2],  Qualitative  cultural 
models  present  the  ideas  associated  with  a  particular  group,  whereas  quantitative 
models  add  information  about  the  prevalence  of  these  ideas  in  the  target  population. 
In  addition  to  seeing  the  approach  described  in  this  paper  as  a  means  to  validate  and 
refine  qualitative  cultural  models,  it  is  also  possible  to  see  the  approach  as  enabling  a 


1  See  http://www.ecs.soton.ac.iik/research/projects/746. 
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cultural  analyst  to  estimate  the  relative  frequency  of  ideas  in  a  target  population  and 
thus  develop  quantitative  extensions  of  the  qualitative  cultural  models  (see  Section 
3.6). 


3  Web-Based  Knowledge  Extraction  for  Cultural  Model 
Development 

In  this  section,  we  describe  an  approach  to  cultural  model  dev  elopment  that  combines 
CNA  with  state-of-the-art  approaches  to  knowledge  representation  and  Web-based 
information  extraction  The  aim  is  to  better  enable  cultural  model  developers  to 
exploit  the  WWW  as  a  source  of  culture-relevant  information,  The  approach  we 
describe  is  based  on  a  decade  of  research  into  OBIE  systems  [see  3  for  a  review],  and 
it  combines  conventional  approaches  to  information  extraction  with  an  ontology  that 
provides  background  knowledge  about  the  kinds  of  entities  and  relationships  that  are 
deemed  important  in  a  cultural  modeling  context.  The  first  step  in  the  process  is  to 
develop  an  initial  qualitative  cultural  model  using  a  limited  set  of  knowledge  sources 
[see  2  for  more  details  on  this  step].  The  second  step  involves  the  development  of  a 
cultural  ontology  using  the  qualitative  cultural  model  as  a  reference  point.  This 
ontology  is  represented  using  the  Ontology  Web  Language  (OWL),  which  lias 
emerged  as  a  de  facto  standard  for  formal  knowledge  representation  on  the  WWW. 
The  third  step  is  to  manually  annotate  sample  texts  using  the  cultural  ontology  in 
order  to  provide  a  training  corpus  for  rule  learning.  Rule  learning,  in  the  current 
context  is  mediated  by  the  (LP)‘  algorithm,  which  is  a  supervised  algorithm  that  has 
been  used  to  develop  a  variety  of  adaptive  information  extraction  and  semantic 
annotation  capabilities  [4,  5],  Following  the  development  of  information  extraction 
rules,  the  mles  are  then  applied  to  Web  resources  in  the  fourth  step  in  order  to  identify 
instances  of  the  entities  defined  in  the  initial  qualitative  cultural  model.  Step  five 
consists  in  the  identification  and  extraction  of  causal  relations.  The  extraction  of 
causal  relationships  is  a  difficult  challenge  because  techniques  for  information 
extraction  tend  to  focus  on  the  extraction  of  particular  entities  in  a  text,  rather  than  the 
relationships  between  the  entities.  We  attempt  to  extract  causal  relationships  using  an 
approach  that  combines  the  use  of  background  knowledge  in  the  form  of  a  domain 
ontology  with  the  general  purpose  lexical  database.  WordNet  [6],  Finally,  in  step  6, 
the  extracted  cultural  knowledge  is  integrated,  stored,  and  used  to  estimate  the  relative 
frequencies  of  the  various  ideas  presented  in  the  initial  qualitative  cultural  model.  We 
briefly  describe  each  of  these  steps  in  subsequent  sections. 


3.1  Step  1:  Develop  Qualitative  Cultural  Model 

The  technique  used  to  develop  qualitative  cultural  models  has  been  described  in 
previous  woric  [2],  and  we  will  not  reiterate  the  details  of  the  technique  here.  Figure  1 
illustrates  a  simplified  qualitative  cultural  model  that  represents  an  extremist  Sunni 
Muslim’s  beliefs  about  current  socio-political  relationships  between  Islam  and  the 
West.  The  set  of  ideas  represenled  in  Figure  1  were  extracted  from  articles  that 
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describe  jihadist  narratives,  and  they  are  presented  here  for  illustrative  purposes.  The 
cultural  model  illustrates  concepts  shared  by  the  group,  as  well  as  their  common 
know  ledge  of  the  causal  relationships  betw  een  those  concepts.  This  shared  know  ledge 
influences  expectations  about  how  socio-political  relationships  will  unfold,  and  it 
provides  a  basis  for  the  selection  of  particular  actions  and  decision  outcomes:  for 
example,  the  decision  to  support  jiliad. 


Figure  1.  Sunni  extremist  cultural  model  of  jihad  (simplified). 

Figure  1  shows  the  three  different  kinds  of  ideas  dial  are  the  targets  of  a  culture- 
oriented  knowledge  extraction  system.  These  ideas  include  simple  concepts  such  as 
"Western  ideology”  and  "Muslim  Honor",  each  represented  as  closed  shapes  in  Figure 
1.  It  also  includes  causal  beliefs:  for  example,  the  idea  that  Western  ideology  (e.g. 
secularism,  nationalism)  is  inhibiting  the  formation  of  a  unified  Islamic  caliphate  and 
the  idea  that  the  West  promotes  this  ideology  because  it  is  engaged  in  a  covert  war 
against  Islam.  These  causal  beliefs  are  represented  as  arrows  in  the  figure,  with  the  +/- 
symbols  indicating  the  polarity  of  the  causal  relationship.  Finally,  Figure  1  portrays 
values  using  specific  shapes,  with  circles  indicating  "positive"  outcomes  and 
hexagons  indicating  “negative"  outcomes.  Developing  an  Islamic  caliphate  is  thus  a 
good  tiling  according  to  the  cultural  model.  Maintaining  (and  enhancing)  Muslim 
honor  is  likewise  valued.  According  to  the  model,  jiliad  is  viewed  positively  and 
should  be  supported  by  the  model's  adherents  due  to  the  perceived  anticipated 
consequences  for  Muslims.  Most  directly,  support  for  jiliad  decreases  the  chances  that 
the  West  will  continue  its  war  against  Islam,  and  it  enhances  collective  Muslim  honor. 


3.2  Step  2:  Develop  Cultural  Ontology 

Once  an  initial  qualitative  cultural  model  lias  been  developed,  the  next  step  in  the 
process  is  develop  an  ontology  that  represents  the  contents  of  the  model.  The  main 
reason  this  step  is  undertaken  is  that  it  enables  the  cultural  model  to  be  used  to 
support  information  extraction.  Over  recent  years,  a  rich  literature  lias  emerged 
concerning  the  use  of  ontologies  in  information  extraction,  and  a  number  of  important 
tools  have  been  developed  to  support  OB  IE  [3],  By  converting  the  cultural  model  into 
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an  ontology  using  standard  knowledge  representation  languages,  such  as  OWL,  we 
are  able  to  capitalize  on  the  availability  of  these  pre-existing  tools  and  techniques,  and 
we  are  also  able  to  compare  the  success  of  our  approach  with  other  OBIE  approaches. 

The  ontologies  developed  to  represent  the  contents  of  cultural  models  are  based 
around  the  notion  of  ideas  as  being  divided  into  concepts,  causal  beliefs  and  values. 
These  three  types  of  ideas  constitute  the  top-level  constructs  of  the  ontology',  and 
subty  pes  of  these  constructs  are  created  to  represent  the  kinds  of  constructs  that  are 
represented  in  the  cultural  model.  For  example,  if  we  consider  the  notion  of  ‘Jihad 
support’,  as  depicted  in  Figure  1,  then  we  can  see  that  this  is  a  type  of  concept,  and  it 
is  regarded  as  a  positive  thing,  at  least  from  the  perspective  of  the  target  group. 
Within  the  ontology  developed  to  support  this  cultural  model  we  have  the  concept  of 
‘support-for-jihad’.  which  is  represented  as  a  type  of  ‘jihad-related-action’,  which  is 
in  turn  represented  as  a  type  of  ‘action’,  which  is  in  turn  represented  as  a  ty  pe  of 
‘concept’.  Given  the  focus  of  the  cultural  model  in  representing  causal  beliefs,  the 
notions  of  actions,  events  and  the  causally -significant  linkages  between  these  types  of 
concept  are  often  the  most  important  elements  of  tire  cultural  ontology. 


3.3  Step  3:  Develop  OBIE  Capability 

In  this  step  of  die  process,  the  aim  is  to  create  rules  diat  automatically  detect  instances 
of  the  concepts,  beliefs  and  values  contained  in  the  cultural  model.  There  are  clearly  a 
number  of  ways  in  which  this  might  be  accomplished,  especially  once  one  considers 
the  rich  array  of  information  extraction  techniques  and  technologies  that  are  currently 
available  [7],  and  not  all  of  these  ways  need  to  rely  on  the  creation  of  symbolic  rules 
(statistical  approaches  to  information  extraction  have  also  demonstrated  considerable 
success  [see  7  for  a  recent  review]).  However,  we  prefer  an  approach  that  delivers 
symbolic  extraction  rules  (i.e.  rules  that  are  defined  over  the  linguistic  features  and 
lexical  elements  of  the  source  texts)  because  the  know  ledge  contained  in  the  rules  can 
be  easily  edited  by  subject  matter  experts.  In  additioa  it  is  easier  to  provide 
explanation-based  facilities  for  rule-based  symbolic  systems  than  it  is  for  systems 
based  on  statistical  techniques. 

The  approach  to  rule  creation  that  we  have  adopted  in  the  context  of  the 
IEXTREME  project  is  based  on  the  use  of  the  (LPr  learning  algorithm,  which  has 
been  used  to  create  a  number  of  semantic  annotation  systems  [4,  5],  The  basic 
approach  is  to  manually  annotate  a  limited  number  of  source  texts  using  the  cultural 
ontology'  that  was  created  in  the  previous  step.  These  annotated  texts  are  then  used  as 
the  training  corpus  for  rule  induction  [see  S  for  more  details].  During  rule  induction, 
the  (LP)2  algorithm  generalizes  from  an  initial  rule  that  is  created  from  a  user -defined 
example  by  using  generic  shallow  knowledge  about  natural  language.  This  knowledge 
is  provided  by  a  variety  of  NLP  resources,  such  as  a  morphological  analyzer,  a  part- 
of-speech  (POS)  tagger  and  a  gazetteer.  The  rules  that  result  from  the  learning  process 
thus  incorporate  a  variety  of  lexical  and  linguistic  features.  Previous  research  has 
suggested  that  rules  could  be  defined  over  a  large  number  of  features.  For  example, 
Bontcheva  et  al  [9]  used  a  variety'  of  NLP  tools  to  generate  94  features  over  which 
information  extraction  rules  could  be  defined.  Of  course,  not  all  these  features  are 
likely  to  be  of  equal  value  in  creating  information  extraction  systems,  and  further 
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empirical  studies  are  required  to  assess  their  relative  value  in  the  domain  of  cultural 
modeling  (see  Section  4). 


3.4  Step  4:  Extract  Concepts 

Once  extraction  rules  have  been  created  they  can  be  applied  to  potential  knowledge 
sources  in  order  to  detect  occurrences  of  the  various  ideas  expressed  in  the  cultural 
model.  Because  most  of  the  user-defined  annotations  will  be  based  on  the  nodal 
elements  of  the  cultural  model  networks,  such  as  those  seen  in  Figure  1,  this  step  is 
particularly  useful  for  detecting  mentions  of  specific  concepts  in  source  texts.  In  the 
case  of  Sunni  extremist  cultural  models  this  could,  for  example,  include  mentions  of 
jihad-related  concepts,  for  example  ‘ Jihad  is  a  means  to  expel  the  Western  occupiers', 
as  well  as  references  to  aspects  of  Western  ideology.  In  general,  information 
extraction  systems  based  on  the  machine  learning  technique  described  above  (i.e.  the 
(LP)2  algorithm)  have  proved  highly  effective  in  identifying  instances  of  tire  terms 
defined  in  an  ontology,  so  we  expect  reasonable  extraction  performance  for  this  step 
of  the  process. 


3.5  Step  5:  Extract  Causal  Relations 

There  have  been  a  number  of  attempts  to  extract  relational  information  in  a  Web- 
based  context  [see  7],  The  use  of  ontologies  in  such  systems  plays  an  important  role 
because  they  provide  background  knowledge  about  the  possible  semantic 
relationships  that  are  likely  to  exist  between  the  various  entities  identified  in  previous 
processing  sleps.  Thus,  if  a  system  first  subjects  a  text  resource  to  entity-based 
semantic  annotation,  then  it  is  able  to  use  the  ontology  to  form  expectations  about  the 
kind  of  relationships  that  might  be  apparent  in  particular  text  fragments.  When  this 
background  knowledge  is  combined  with  lexical  and  linguistic  information,  a  relation 
extraction  system  is  often  able  to  identify  relationships  that  would  be  impossible  to 
detect  based  on  a  text-only  analysis. 

The  approach  to  relation  extraction  that  we  have  adopted  in  the  case  of  the 
EXTREME  project  is  based  on  a  technique  that  was  previously  dev  eloped  to  support 
information  extraction  in  the  domain  of  artists  and  artistic  works  [10].  The  approach 
builds  on  the  outcome  of  the  previous  step,  which  is  concerned  with  the  detection  of 
concepts  in  the  source  texts.  Importantly,  once  these  concept  annotations  are  in  place, 
the  relation  extraction  subsystem  is  provided  with  a  much  richer  analytic  substrate 
than  would  otherwise  have  been  the  case.  In  fact,  it  is  only  once  such  annotations  are 
in  place  that  the  real  value  of  the  ontology  (for  the  detection  and  extraction  of 
relationships)  can  be  appreciated,  for  the  ontology'  provides  background  knowledge 
that  drives  the  formation  of  expectations  about  the  kinds  of  relationships  that  could 
appear  between  concepts,  and  once  such  expectations  have  been  established,  they  can 
be  supported  or  undermined  by  subsequent  lexical  analysis  of  the  sentence  in  which 
the  concepts  appear. 

Obviously,  the  nature  of  the  natural  language  processing  that  is  performed  on  the 
sentence  is  key  to  this  relation-based  annotation  capability.  It  is  not  sufficient  for  a 
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system  to  simply  form  an  expectation  about  the  kind  of  relationships  that  might  occur 
between  identified  entities  in  the  text;  the  system  also  needs  to  ascertain  whether  the 
linguistic  context  of  the  sentence  supports  the  assertion  of  a  particular  relationship. 
The  decision  concerning  which  relationship  (if  any)  to  assert  in  a  particular  sentential 
context  is  based  on  a  strategy  similar  to  that  used  in  previous  research  [10], 
Essentially,  each  relationship  in  the  ontology  is  associated  with  a  ‘synsef  (a  set  of 
synonyms)  in  the  general-purpose  lexical  database  WordNet  [6],  When  the  relation 
extraction  system  executes,  it  attempts  to  match  the  words  in  a  sentence  against  the 
WordNet-based  linguistic  grounding  provided  for  each  expected  relationship.  In 
addition  to  representing  information  about  synonyms,  the  WordNet  database  also 
represents  hypemymy  (superordinate)  and  hyponymy  (subordinate)  relationships. 
These  can  be  used  to  support  the  matching  process  by  avoiding  problems  due  to 
transliteration 


3.6  Step  6:  Exploit  Know  ledge  Extraction  Capability 

The  knowledge  extraction  capability'  outlined  in  tie  previous  steps  provides  support 
for  the  refinement,  extension  and  validation  of  the  knowledge  contained  in  cultural 
models,  The  ability  to  detect  instances  of  the  ideas  expressed  in  cultural  models 
across  a  range  of  Web  resources  (including  blogs,  organizational  websites,  discussion 
threads  and  so  on)  provides  a  means  by  which  new  knowledge  sources  can  be 
discovered  and  made  available  for  a  variety  of  further  model  development  and 
refinement  activities.  The  use  of  OBIE  technology  therefore  provides  a  means  by 
which  the  latent  potential  of  the  Web  to  sene  as  a  source  of  culture-relevant 
knowledge  and  information  can  be  exploited  in  the  context  of  qualitative  cultural 
modeling  initiatives. 

Aside  from  the  development  of  better  qualitative  cultural  models,  the  use  of 
knowledge  extraction  techniques  can  also  support  the  development  of  quantitative 
cultural  models.  As  discussed  above,  quantitative  cultural  models  extend  qualitative 
cultural  models  by  including  information  about  the  relative  frequencies  of  particular 
ideas  within  the  population  to  which  the  cultural  model  applies  [2].  By  harnessing  tire 
power  of  OBIE  methods,  the  current  approach  provides  a  means  by  which  ideas  (most 
notably  concepts  and  causal  beliefs)  can  be  detected  across  many  hundreds,  if  not 
thousands,  of  Web  resources.  This  provides  an  estimate  of  the  prevalence  of  particular 
ideas  in  the  target  population  of  interest,  and  it  provides  a  means  by  which  the  Web 
can  be  used  to  support  the  development  of  quantitative  cultural  models. 


4  Conclusions  and  Future  Work 

This  paper  has  described  an  approach  to  harnessing  the  latent  potential  of  the  Web  to 
support  cultural  modeling  efforts.  The  approach  is  based  on  the  development  of 
culture-oriented  knowledge  extraction  capabilities  and  the  use  of  techniques  that 
support  a  cognitive  characterization  of  specific  cultural  groups.  Systems  developed  to 
support  the  approach  may  be  seen  as  an  important  element  of  iterative  cultural 
modeling  efforts:  ones  in  which  an  initial  qualitative  cultural  model  drives  the 
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acquisition  of  information  from  a  large  number  of  heterogeneous  Web-based 
resources.  This,  in  turn,  supports  the  effective  refinement,  extension  and  validation  of 
cultural  model  content. 

In  terms  of  future  work,  a  prototype  system  is  currently  being  developed  within  the 
context  of  the  EXTREME  project.  This  exemplifies  the  approach  described  herein, 
and  it  also  enables  us  to  address  a  number  of  research  issues.  One  research  issue 
concerns  the  need  to  adapt  the  information  extraction  techniques  so  as  to  optimize 
performance  in  the  domain  of  cultural  analysis.  This  includes  the  need  to  find  tire  best 
mix  of  linguistic  features  with  respect  to  the  resources  being  analyzed  [see  9],  A 
further  issue  for  future  work  concerns  the  extension  of  the  approach  to  support  the 
detection  and  extraction  of  values.  Values,  recall,  constitute  one  of  three  types  of 
ideas  associated  with  cultural  models.  The  approach  presented  here,  however,  is 
clearly  focused  on  the  extraction  of  concepts  and  causal  beliefs  rather  titan  values. 
Extending  the  approach  to  incorporate  values  may  require  us  to  consider  techniques 
that  have  been  developed  to  support  opinion  mining  and  sentiment  analysis  on  tire 
WWW  [see  11  fora  review]. 
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APPENDIX  D.  ACRONYMS 

DOD  Department  of  Defense 

CC  Coherence  Consistency 

CD  Coherence  Diversity 

CNA  Cultural  Network  Analysis 

IE  information  extraction 

IEI  Information  Exchange  Interaction 

IES  Information  Exchange  Separation 

JA  Judgment  Authority 

JI  Judgment  Independence 

KC  Knowledge  Change 

KM  Knowledge  Maintenance 

LP2  Learning  Patterns  via  Language  Processing 

OBIE  Ontology-Based  Infonnation  Extraction 

OVA  Ontology  Viewer  Application 

OWL  Ontology  Web  Language/Web  Ontology  Language 

PMIs  Polarizing  Metacognitive  Ideas 

POS  Part  of  Speech 

SBP  Social  Computing,  Behavioral-Cultural  Modeling,  and  Prediction 

W3C  W orld  Wide  Web  Consortium 
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